Agentic AI Chip Transition — In-Depth Analysis of Inference Chip Component Supply Chain

From Training to Inference —
Component Map Changed by Agentic Chips

Groq 3 LPU, Feynman architecture, and SRAM-based inference chips unveiled at GTC 2026 require fundamentally different component structures than existing training chips. We analyze key components facing supply shortages and beneficiary companies.

DATE 2026-03-18

TYPE Trend

STYLE Stacker

HORIZON Mid (1-6M)

SIGNAL OW · +50

A structural inflection point has arrived in the AI chip industry. At GTC 2026, NVIDIA unveiled a dedicated inference chip — Groq 3 LPU — for the first time in history, rather than a GPU. This chip, which integrates Groq's technology acquired for ~$20 billion, uses 500MB on-chip SRAM instead of HBM, and achieves 35 times higher inference throughput than Blackwell NVL72 with a 150TB/s bandwidth.FACT

Key Finding: The BOM (Bill of Materials) of inference chips is fundamentally different from that of training chips. SRAM accounts for 70-80% of the die area, and chip-to-chip direct interconnects (96 C2C links, 112Gbps each) are used instead of HBM. This structural change causes new shortages at points completely different from the bottlenecks in the existing training chip supply chain.INFERENCE

The Feynman architecture, scheduled for release in 2028, introduces TSMC A16 process, 3D die stacking (SRAM-over-compute), and silicon photonics, and uses Intel EMIB packaging. Complete hardware separation of training and inference is becoming an industry standard.FACT

◆ INVESTMENT THESIS

"Own the Bottleneck" is the key strategy in the training→inference transition. Shortage severity by component: CoWoS packaging (CRITICAL) > SRAM die area (CRITICAL) > HBM4 (HIGH) > T-glass substrate (HIGH) > MLCC (MODERATE-HIGH). Companies with pricing power over the most severe bottlenecks — TSMC, SK Hynix, Samsung Electronics, and Broadcom — are structurally benefited.

Groq 3 LPU — NVIDIA's First Dedicated Inference Chip

At the GTC 2026 keynote, Jensen Huang abandoned NVIDIA's philosophy of "one GPU handles everything" for the first time in NVIDIA's history and unveiled dedicated inference hardware Groq 3 LPU. This is the first result of integration after acquiring Groq for ~$20 billion in December 2025.FACT

Groq 3 LPX Rack Specifications: 256 LPUs, total 128GB on-chip SRAM, 40PB/s rack-level bandwidth. 32 compute trays (8 LPUs each) are directly connected by a copper spine. Claims 35 times higher throughput compared to Blackwell NVL72 and 1,500 tokens/sec in trillion-parameter models.FACT

Samsung Electronics is mass-producing Groq 3 LPU using the 4nm (SF4X) process. The yield is extremely low with a die size of 700mm² or more (approximately 64 chips per wafer). The goal is to increase wafer shipments by ~70% from 9,000 to 15,000 sheets per year.FACT

Feynman Architecture — 2028 Inference Native Platform

NVIDIA previewed the next-generation architecture after Vera Rubin, Feynman, at GTC 2026. Includes TSMC A16 (1.6nm) process, silicon photonics (optical NVLink), 3D die stacking (SRAM-over-compute), custom Rosa CPU, and BlueField 5. Claims 14x performance compared to Blackwell.FACT

Intel supplies EMIB (Embedded Multi-die Interconnect Bridge) packaging technology to Feynman. The combination of TSMC A16 + Intel EMIB is the industry's first cross-foundry advanced packaging collaboration.FACT

◆ VERA RUBIN + GROQ 3 = INFERENCE DISAGGREGATION

The LPX uses the Attention-FFN Disaggregation (AFD) architecture in the Vera Rubin platform. The Rubin GPU handles KV cache-based attention, and the LPX accelerates Feed-Forward (FFN) and MoE layers. This is a structural innovation that goes beyond the physical separation of training/inference and subdivides workloads even within inference.FACT

From Training to Inference —
Component Map Changed by Agentic Chips

Executive Summary

Training vs Inference — Major Architectural Shift

GTC 2026 — Groq 3 LPU & Feynman Architecture

Groq 3 LPU — NVIDIA's First Dedicated Inference Chip

Feynman Architecture — 2028 Inference Native Platform

From Training to Inference —Component Map Changed by Agentic Chips

Executive Summary

Training vs Inference — Major Architectural Shift

GTC 2026 — Groq 3 LPU & Feynman Architecture

Groq 3 LPU — NVIDIA's First Dedicated Inference Chip

Feynman Architecture — 2028 Inference Native Platform

From Training to Inference —
Component Map Changed by Agentic Chips