Chapter 07 · Games & Simulation

Games: Reinforcement learning milestones.

From board games to infinite virtual sandboxes, game environments remain the ultimate proving ground for general intelligence, testing long-horizon planning, spatial reasoning, and real-time execution.

01
Act I · Classic Reinforcement Learning

Mastering perfect and imperfect information systems.

The foundation of modern game intelligence was built on reinforcement learning, progressing from brute-force chess calculations to deep neural networks playing complex, real-time strategy games from self-play.

  • Deep Blue (1997): Defeated world champion Garry Kasparov in Chess by calculating 200M positions/sec.
  • AlphaGo (2016): Defeated 18-time world champion Lee Sedol 4-1 in Go by combining deep learning with Monte Carlo Tree Search.
  • AlphaZero (2017): Mastered Chess, Shogi, and Go entirely from scratch via self-play RL, without using human games.
  • AlphaStar (2019): Reached Grandmaster level (top 0.15% of active players) in StarCraft II, mastering imperfect information and real-time planning.
  • AlphaDev (2022): Discovered faster sorting algorithms in assembly code, integrated directly into the LLVM libc++ library.
02
Act II · Modern Agent Simulations

Long-horizon planning in open-ended sandboxes.

Today's agents are moving beyond board games to master open-ended construction, resource gathering, and economic trade. These sandboxes serve as pre-deployment testbeds for physical robotics and agentic swarms.

  • Factorio Learning Environment: Evaluates long-horizon planning in open-ended logistics loops. Claude 3.7 Sonnet achieved a score of 29.1 in lab-play mode, establishing a new baseline.
  • Minecraft Economies: Project Sid (2024) ran simulations of 1,000 autonomous LLM agents in Minecraft, where they self-organized labor, formed a trade economy, and established social structures.
  • GTA V Mods (PedGPT): Deploying on-the-fly conversational LLM agents (like Llama 3.1) into GTA V, giving characters unscripted autonomous behavior and memory.
03
Act III · Neural World Simulators

Generative physics replacing traditional game engines.

Rather than hand-building physics grids, developers are utilizing generative world models to simulate continuous, interactive visual environments instantly.

  • Google Genie 3 (July 2025): Generates playable, interactive 3D worlds at 720p / 24 FPS with a 60-second temporal memory from a single text prompt.
  • World Labs Marble (November 2025): Turns text, photos, and videos into persistent, explorable 3D environments.
  • Decart DOS 2.0: A streamed, real-time neural game engine that calculates pixels and game states continuously on-the-fly.
Milestone Registry

David Silver's Reinforcement Learning Lineage

SystemYearMilestone AchievementSignificance
Deep Blue1997Defeated Garry Kasparov (Chess)First computer to beat world champion in match play.
AlphaGo2016Defeated Lee Sedol (Go)Combined deep learning + Monte Carlo tree search; Move 37.
AlphaZero2017Mastered Chess, Shogi, GoSelf-play reinforcement learning without using human data.
AlphaStar2019Grandmaster (StarCraft II)Real-time strategy with imperfect information; top 0.15% EU.
AlphaDev2022Discovered sorting algorithmsRL for code optimization; integrated into LLVM libc++.