← Back to KHAO

Nvidia · Google · Apple · AI Agent ·

Google's DiffusionGemma AI Hits 1,000 Tokens Per Second—And It's Free

2 min read

Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.

★ Tier-1 Source

Google dropped DiffusionGemma today, an open model AI that generates text the way image generators create pictures: start with noise, refine until it makes sense.

Key facts

Summary

Google released DiffusionGemma, a free open-weight model that generates entire 256-token blocks simultaneously via text diffusion—hitting over 1,000 tokens per second on an NVIDIA H100, four times faster than standard autoregressive models. The custom drafter module DiffusionGemma needs for local inference doesn't exist in any public runtime yet—not in mlx-lm, not in LM Studio—making it effectively unrunnable on most consumer setups today. On NVIDIA NIM, the model arrived preconfigured at 8,192 tokens of context—below the 64,000-token floor that agentic frameworks like Hermes Agent require—meaning autonomous workflows won't run without manual reconfiguration. Every LLM you've used is a typewriter.

Read full article at Decrypt →

#Nvidia #Google #Apple #AI Agent