Vera Rubin NVL72, Groq 3, & Gaudi 3
AI accelerators are scaling transistor counts and interconnect bandwidth. The NVIDIA Blackwell B300 (192GB HBM3e) is being succeeded by the Vera Rubin NVL72 system (combining 72 Rubin R100 GPUs and 36 Vera CPUs) delivering 3.6 exaflops of FP4 compute. Alternative architectures like the Groq 3 LPX Rack leverage 128GB of aggregate ultra-low latency SRAM to bypass HBM memory bottlenecks entirely.
- ▸NVIDIA Rubin R100: 336B transistors, 288GB HBM4, 22 TB/s bandwidth, 50 PFLOPS FP4 compute.
- ▸Intel Gaudi 3: 1,835 TFLOPS BF16, 128GB HBM2e, 3.7 TB/s bandwidth, priced disruptively at $15,625.
- ▸Groq 3 LPX Rack: 256 LPUs, 500MB SRAM each (128GB aggregate), 640 TB/s scale-up bandwidth.