Chapter 07 · Games & Simulation

Games: Reinforcement learning milestones.

From board games to infinite virtual sandboxes, game environments remain the ultimate proving ground for general intelligence, testing long-horizon planning, spatial reasoning, and real-time execution.

Act I · Classic Reinforcement Learning

Mastering perfect and imperfect information systems.

The foundation of modern game intelligence was built on reinforcement learning, progressing from brute-force chess calculations to deep neural networks playing complex, real-time strategy games from self-play.

▸Deep Blue (1997): Defeated world champion Garry Kasparov in Chess by calculating 200M positions/sec.
▸AlphaGo (2016): Defeated 18-time world champion Lee Sedol 4-1 in Go by combining deep learning with Monte Carlo Tree Search.
▸AlphaZero (2017): Mastered Chess, Shogi, and Go entirely from scratch via self-play RL, without using human games.
▸AlphaStar (2019): Reached Grandmaster level (top 0.15% of active players) in StarCraft II, mastering imperfect information and real-time planning.
▸AlphaDev (2022): Discovered faster sorting algorithms in assembly code, integrated directly into the LLVM libc++ library.

Act II · Modern Agent Simulations

Long-horizon planning in open-ended sandboxes.

Today's agents are moving beyond board games to master open-ended construction, resource gathering, and economic trade. These sandboxes serve as pre-deployment testbeds for physical robotics and agentic swarms.

▸Factorio Learning Environment: Evaluates long-horizon planning in open-ended logistics loops. Claude 3.7 Sonnet achieved a score of 29.1 in lab-play mode, establishing a new baseline.
▸Minecraft Economies: Project Sid (2024) ran simulations of 1,000 autonomous LLM agents in Minecraft, where they self-organized labor, formed a trade economy, and established social structures.
▸GTA V Mods (PedGPT): Deploying on-the-fly conversational LLM agents (like Llama 3.1) into GTA V, giving characters unscripted autonomous behavior and memory.

Act III · Neural World Simulators

Generative physics replacing traditional game engines.

Rather than hand-building physics grids, developers are utilizing generative world models to simulate continuous, interactive visual environments instantly.

▸Google Genie 3 (July 2025): Generates playable, interactive 3D worlds at 720p / 24 FPS with a 60-second temporal memory from a single text prompt.
▸World Labs Marble (November 2025): Turns text, photos, and videos into persistent, explorable 3D environments.
▸Decart DOS 2.0: A streamed, real-time neural game engine that calculates pixels and game states continuously on-the-fly.

Milestone Registry

David Silver's Reinforcement Learning Lineage

System	Year	Milestone Achievement	Significance
Deep Blue	1997	Defeated Garry Kasparov (Chess)	First computer to beat world champion in match play.
AlphaGo	2016	Defeated Lee Sedol (Go)	Combined deep learning + Monte Carlo tree search; Move 37.
AlphaZero	2017	Mastered Chess, Shogi, Go	Self-play reinforcement learning without using human data.
AlphaStar	2019	Grandmaster (StarCraft II)	Real-time strategy with imperfect information; top 0.15% EU.
AlphaDev	2022	Discovered sorting algorithms	RL for code optimization; integrated into LLVM libc++.

← Chapter 06: AGI & ASI Chapter 08: Next Decade →