← Back to KHAO

Nvidia ·

Open models are driving a new wave of on-device AI, extending innovation beyond the cloud to everyday devices

2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

★ Tier-1 Source

All configurations measured using Q4_K_M quantizations BS = 1, ISL = 4096 and OSL = 128 on NVIDIA GeForce RTX 5090 and Mac M3 Ultra desktops. Token generation throughput measured on llama.cpp b7789, using the llama-bench tool.

Designed for this shift, Google’s latest additions to the Gemma 4 family introduce a class of small, fast and omni-capable models built for efficient local execution across a wide range of devices.

Key facts

Summary

Open models are driving a new wave of on-device AI, extending innovation beyond the cloud to everyday devices. Google and NVIDIA have collaborated to optimize Gemma 4 for NVIDIA GPUs, enabling efficient performance across a range of systems — from data center deployments to NVIDIA RTX-powered PCs and workstations, the NVIDIA DGX Spark personal AI supercomputer and NVIDIA Jetson Orin Nano edge AI modules. The latest additions to the Gemma 4 family of open models — spanning E2B, E4B, 26B and 31B variants — are designed for efficient deployment from edge devices to high-performance GPUs. This new generation of compact models supports a range of tasks, including:.

Read full article at NVIDIA Blog →

#nvidia #google