← Back to KHAO

Microsoft · Nvidia · Google ·

Today, Google DeepMind released DiffusionGemma — an experimental open model built for exceptionally fast text generation

2 min read

Compiled by KHAO Editorial — aggregated from 2 sources. See llms.txt for citation guidance.

✓ KHAO Verified

Watch NVIDIA CEO Jensen Huang’s GTC Taipei Keynote.

Rather than generating text one word at a time, DiffusionGemma generates multiple words in parallel to output whole blocks of text, opening a new, low-latency frontier for the kind of single-user workloads that developers, researchers and AI enthusiasts run every day.

Key facts

Summary

Today, Google DeepMind released DiffusionGemma — an experimental open model built for exceptionally fast text generation. Parallel generation: DiffusionGemma denoises up to 256 tokens per step instead of predicting one at a time. Built on Gemma 4: DiffusionGemma is built on Gemma 4, a 26-billion-parameter mixture-of-experts model that activates 3.8 billion parameters per step, pairing a diffusion head with Google’s Gemma 4 architecture. Up to 4x faster performance: The boost means fast text generation, where single-user generation usually stalls, on local hardware.

#Microsoft #Nvidia #Google