NeurIPS 2026 Competition Track

Foundation Agent Challenge 2026

One agent. A thousand games.

Train a single model and harness to play across text games, web games, and emulated games. The competition runs on Kaggle; this site is the landing page for rules, resources, and updates.

100 public-crawl games
1,000 private-crawl games
40B parameter limit
The Challenge

Can one agent play many games?

Participants submit one self-contained agent: a model up to 40B parameters plus a harness. The public crawl gives teams a broad training set; the held-out private crawl determines the primary ranking.

Scores are normalized per game, averaged within each runtime category, then averaged across categories with equal weight.

Text Games

Interactive fiction, structured text observations, menu actions, and token-level decisions.

Web Games

Browser games rendered through JS and HTML5, controlled with mouse and keyboard actions.

Emulated Games

Console and handheld titles with video-frame observations and emulator button inputs.

Kaggle Hosted

Kaggle is where the competition happens.

Registration, containers, submissions, and live competition operations will run through Kaggle. Official competition links and materials will be posted here as they become available.

Preview

Built for generalist game-playing agents.

The crawl spans action, platformer, puzzle, RPG, simulation, strategy, adventure, racing, and survival games across multiple runtimes.

Timeline

2026 competition schedule

  1. Beta launch with public crawl, starter kit, baseline harness, and live leaderboard.
  2. Hybrid hackathon and official kickoff; rules and crawl freeze.
  3. Text games sprint.
  4. Web games sprint.
  5. Emulated games sprint.
  6. Final submissions due before private-crawl evaluation and report review.

Rules Snapshot

Single model, offline evaluation, no private-crawl targeting.

Open to everyone, with no team-size or affiliation restrictions.

Submissions must use one model with at most 40B parameters.

External resources required at inference time must ship inside the container.

Commercial or proprietary model API calls are prohibited during evaluation.

Training on, identifying, or reporting private-crawl games is prohibited.

Participants may submit up to three times per day.

Resources

Starter Kit

Kaggle-compatible containers, unified observation and action interfaces, and public-crawl environments.

Baseline Harness

Continual Harness extended across the public crawl, with an open-source student baseline planned for launch.

Participant Support

Tutorials, FAQs, introductory webinars, Discord support, and GitHub Issues for reproducible bug reports.

Organizers

Organizing team

The challenge is organized by researchers across agent learning, game AI, multimodal evaluation, and competition infrastructure.

Seth Karten

Seth Karten

Princeton University

Wayne Chi

Wayne Chi

Carnegie Mellon University

Jake Grigsby

Jake Grigsby

UT Austin

Wenzhe Li

Wenzhe Li

Princeton University

Chengshuai Shi

Chengshuai Shi

Princeton University

Alex L. Zhang

Alex L. Zhang

MIT CSAIL

Fei Fang

Fei Fang

Carnegie Mellon University

Stephanie Milani

Stephanie Milani

NYU / Johns Hopkins

Karthik Narasimhan

Karthik Narasimhan

Princeton University

Kiran Vodrahalli

Kiran Vodrahalli

Google DeepMind

Yuke Zhu

Yuke Zhu

UT Austin / NVIDIA Research

Chi Jin

Chi Jin

Princeton University