← Back to KHAO

Nvidia · Google · Gemini ·

Google DeepMind publishes DiffusionGemma, a model that runs local AI 4x faster

2 min read

Compiled by KHAO Editorial — aggregated from 2 sources. See llms.txt for citation guidance.

✓ KHAO Verified

Another day, another AI model from Google.

Key facts

Summary

This time, Google DeepMind has released a new member of the Gemma 4 open model family, but it’s fundamentally different from the rest of the lineup. Most AI models are designed to be autoregressive—they generate text left to right one token at a time. DiffusionGemma is fairly large in the realm of Google’s open models. In testing with an RTX 5090, DiffusionGemma spits out around 700 tokens per second. This approach to text generation shifts the bottleneck from memory bandwidth to compute, generating up to 256 tokens in parallel. If diffusion is so much faster, why isn’t Google using it in big cloud-based Gemini models? Google has experimented with this, but there are a few drawbacks to text diffusion, including a higher error rate.

#Nvidia #Google #Gemini