Chapter 01 · Silicon Fundamentals

The Compute Core: Sovereign Silicon and Packaging.

The modern artificial intelligence ecosystem is fundamentally anchored to the physical limits of semiconductor fabrication. As transistor scaling approaches atomic boundaries, progress is defined by packaging architectures, memory bandwidth, and numeric precision.

TSMC 2nm Node Wafer Price

$30,000

A 50% increase over N3's ~$20,000 wafer cost, driven by GAA complexity.

Rubin Memory Bandwidth

22 TB/s

2.75× Blackwell's 8.0 TB/s, powered by the industry's first HBM4 integration.

CoWoS CAGR (2022-2027)

>80%

TSMC's projected annual growth rate for chip-on-wafer-on-substrate packaging.

TSMC SoIC Capacity CAGR

>90%

Projected annual system-on-integrated-chips stacking capacity growth.

Transistor Scaling & GAA Nanosheets

Taiwan Semiconductor Manufacturing Company (TSMC) officially launched mass volume production of its **2nm (N2) node in Q4 2025**, marking the industry's transition from traditional FinFET architectures to **Gate-All-Around (GAA) nanosheet** transistors.

The N2 node delivers **10–15% performance gains at iso-power**, or **25–30% power reduction at iso-performance**, alongside a **15% density uplift** for mixed designs (up to 20% for logic-only components) compared to N3E. This technological leap has come with massive capital requirements: advanced N2 wafer prices have risen to approximately **$30,000 per wafer**, compared to ~$20,000 for 3nm.

Volume production is currently centered at **Fab 22 in Kaohsiung** and **Fab 20 in Hsinchu**, with TSMC planning a **70% compound annual growth rate in 2nm capacity from 2026 to 2028**. In contrast, Intel's rival **18A process** has entered volume manufacturing primarily for internal use, struggling to capture high-volume external foundry clients, leaving TSMC as the uncontested fabricator of the AI frontier.

The Advanced Packaging Bottleneck

As monolithic dies hit the physical reticle limit, performance scaling has shifted to multi-die architectures. TSMC's **CoWoS (Chip-on-Wafer-on-Substrate)** wafer-level packaging is the primary physical bottleneck of the AI accelerator supply chain. By stacking logic processors and High Bandwidth Memory (HBM) on a silicon interposer, CoWoS enables high-bandwidth, low-latency inter-die connections.

TSMC projects **CoWoS capacity to grow more than 80% annually from 2022 to 2027**, while its **SoIC (System-on-Integrated-Chips)** 3D-stacking capacity is projected to increase **over 90% per year**. Despite aggressive domestic expansions in Taiwan and overseas projects in Arizona, Kumamoto, and Dresden, advanced packaging remains the single biggest chokepoint limiting AI accelerator shipments globally.

Memory Hierarchy & Numeric Asymmetries

The physics of training and inference represent a perpetual battle between computation and data movement:

SRAM (Static RAM): Fast, on-die caches with sub-nanosecond latency. While crucial for storing parameter states during active instruction execution, SRAM is extremely expensive and occupies massive silicon area, prompting architectures like Groq to rely on scale-up inter-chip SRAM networks.
HBM (High Bandwidth Memory): 3D-stacked DRAM connected via a silicon interposer. The transition from Blackwell's HBM3e (8.0 TB/s) to the next-generation **HBM4** starting in late 2026 will deliver up to **22 TB/s bandwidth** and **288GB capacity** per GPU (implemented on the NVIDIA Rubin R100), bypassing the standard Key-Value (KV) cache memory bottlenecks.
Numeric Precision: While training continues to utilize 16-bit precisions (FP16/BF16), inference has shifted aggressively to lower bit-widths. The introduction of **FP8** and **FP4** (specifically NVIDIA's NVFP4 with micro-block scaling) allows up to **7× GEMM (General Matrix Multiply) speedups** over Hopper, compressing large models without sacrificing semantic accuracy.

Chapter Citations

[1] TSMC 2nm Capacity ProjectionsFocus Taiwan details on Kaohsiung Fab 22 / Hsinchu Fab 20 and 70% CAGR.
[2] TSMC Launches 2nm GAA ProductionVolume production launch metrics, transistor density, and power curves.
[3] NVIDIA Rubin 336B AnalysisDetailed architectural teardown of R100, HBM4 integration, and N3 process.

All sources verified against primary SEC filings, lab blogs, and foundry registries.

Next Chapter

How do these physical chips convert electricity and logic gates into language? Inspect the attention mechanics.

02 · How AI Works →

AI Accelerator Specifications (2026 Landscape)

Accelerator	Process Node	Transistors	On-Chip / HBM Memory	Bandwidth	Peak Performance	Status
NVIDIA B300	TSMC 4nm (N4P)	208 Billion	192GB HBM3e	8.0 TB/s	~10-20 PFLOPS FP4	Shipping (18-week lead times)
NVIDIA R100 (Rubin)	TSMC 3nm (N3)	336 Billion	288GB HBM4	22.0 TB/s	50 PFLOPS FP4	Sampling Q4 2026, Volume Q1 2027
Intel Gaudi 3	TSMC 5nm (N5)	Undisclosed	128GB HBM2e	3.7 TB/s	1,835 TFLOPS BF16	Shipping (200K-250K units target)
Groq 3 LPX Rack	Undisclosed	Undisclosed	128GB SRAM (Aggregate)	640.0 TB/s (Scale-up)	Ultra-low latency LPU cluster	Shipping Q3 2026

← Overview Chapter 02: How AI Works →