Groq 3 LPU, Feynman architecture, and SRAM-based inference chips unveiled at GTC 2026 require fundamentally different component structures than existing training chips. We analyze key components facing supply shortages and beneficiary companies.
A structural inflection point has arrived in the AI chip industry. At GTC 2026, NVIDIA unveiled a dedicated inference chip — Groq 3 LPU — for the first time in history, rather than a GPU. This chip, which integrates Groq's technology acquired for ~$20 billion, uses 500MB on-chip SRAM instead of HBM, and achieves 35 times higher inference throughput than Blackwell NVL72 with a 150TB/s bandwidth.FACT
Key Finding: The BOM (Bill of Materials) of inference chips is fundamentally different from that of training chips. SRAM accounts for 70-80% of the die area, and chip-to-chip direct interconnects (96 C2C links, 112Gbps each) are used instead of HBM. This structural change causes new shortages at points completely different from the bottlenecks in the existing training chip supply chain.INFERENCE
The Feynman architecture, scheduled for release in 2028, introduces TSMC A16 process, 3D die stacking (SRAM-over-compute), and silicon photonics, and uses Intel EMIB packaging. Complete hardware separation of training and inference is becoming an industry standard.FACT
For the first time in the history of AI chips, training and inference are separated into physically different hardware. Training chips (GPUs) are optimized for large-scale matrix multiplication parallel processing, while inference chips (LPUs) focus on ultra-low latency for sequential token generation. This structural difference fundamentally changes the BOM configuration.FACT
At the GTC 2026 keynote, Jensen Huang abandoned NVIDIA's philosophy of "one GPU handles everything" for the first time in NVIDIA's history and unveiled dedicated inference hardware Groq 3 LPU. This is the first result of integration after acquiring Groq for ~$20 billion in December 2025.FACT
Groq 3 LPX Rack Specifications: 256 LPUs, total 128GB on-chip SRAM, 40PB/s rack-level bandwidth. 32 compute trays (8 LPUs each) are directly connected by a copper spine. Claims 35 times higher throughput compared to Blackwell NVL72 and 1,500 tokens/sec in trillion-parameter models.FACT
Samsung Electronics is mass-producing Groq 3 LPU using the 4nm (SF4X) process. The yield is extremely low with a die size of 700mm² or more (approximately 64 chips per wafer). The goal is to increase wafer shipments by ~70% from 9,000 to 15,000 sheets per year.FACT
NVIDIA previewed the next-generation architecture after Vera Rubin, Feynman, at GTC 2026. Includes TSMC A16 (1.6nm) process, silicon photonics (optical NVLink), 3D die stacking (SRAM-over-compute), custom Rosa CPU, and BlueField 5. Claims 14x performance compared to Blackwell.FACT
Intel supplies EMIB (Embedded Multi-die Interconnect Bridge) packaging technology to Feynman. The combination of TSMC A16 + Intel EMIB is the industry's first cross-foundry advanced packaging collaboration.FACT