Inflection · The Register
Inference is giving AI chip outfits a second chance to make their mark
Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.
◌ Single Source
AI adoption is reaching an inflection point as the focus shifts from training new models to serving them.
Key facts
- Lumai expects its next-gen Iris Tetra systems will achieve an exaOPS of AI performance in a 10kW power budget by 2029
- Nvidia's $20 billion acquihire of Groq back in December is a prime example
- The architecture is still in its infancy, capable of running billion parameter models like Llama 3.1 8B or 70B today, but it's far enough along that the UK-based startup has opened its chips up
- The startup's SRAM-heavy chip architecture meant that, with enough of them, Groq's LPUs could churn out tokens faster than any GPU
Summary
Compared to training, inference is a much more diverse workload, which presents an opportunity for chip startups to carve out a niche for themselves. Because of this, inference has become increasingly heterogeneous, certain aspects of which may be better suited to GPUs and other more specialized hardware. Nvidia's $20 billion acquihire of Groq back in December is a prime example. Nvidia side stepped this problem by moving the compute heavy prefill bit of the inference pipeline to its GPUs while it kept the bandwidth-constrained decode operations on its shiny new LPUs.