Microsoft · Nvidia · Google · NVIDIA Blog

Today, Google DeepMind released DiffusionGemma — an experimental open model built for exceptionally fast text generation

Wed, Jun 10 · 4:15 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 2 sources. See llms.txt for citation guidance.

✓ KHAO Verified

Watch NVIDIA CEO Jensen Huang’s GTC Taipei Keynote.

Rather than generating text one word at a time, DiffusionGemma generates multiple words in parallel to output whole blocks of text, opening a new, low-latency frontier for the kind of single-user workloads that developers, researchers and AI enthusiasts run every day.

Key facts

Building Windows agents got a full toolset, NVIDIA and Microsoft rolled out turnkey agent sandboxing on native Windows, Microsoft eXecution Containers plus the NVIDIA OpenShell runtime, alongside up
DGX Spark goes from unboxing to a running agent in minutes, A streamlined NVIDIA NemoClaw install gets developers to a working local agent fast, with Qwen3.6-35B running up to 2.6x faster on vLLM
The fastest way to start testing and prototyping the model is through Hugging Face Transformers, which runs DiffusionGemma on a GeForce RTX 5090 or DGX Spark out of the box
NVIDIA researchers released SANA-WM, an open source world model that turns a single image and a camera path into a minute-long, 720p video with precise 6-DoF control

Summary

Today, Google DeepMind released DiffusionGemma — an experimental open model built for exceptionally fast text generation. Parallel generation: DiffusionGemma denoises up to 256 tokens per step instead of predicting one at a time. Built on Gemma 4: DiffusionGemma is built on Gemma 4, a 26-billion-parameter mixture-of-experts model that activates 3.8 billion parameters per step, pairing a diffusion head with Google’s Gemma 4 architecture. Up to 4x faster performance: The boost means fast text generation, where single-user generation usually stalls, on local hardware.

#Microsoft #Nvidia #Google