Business · NVIDIA Blog

NVIDIA Rolls out Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents

Tue, Apr 28 · 4:00 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 2 outlets. See llms.txt for citation guidance.

✓ KHAO Verified

AI agent systems today juggle separate models for vision, speech and language — losing time and context as they pass data from one model to the other.

Key facts

AI and software companies already adopting Nemotron 3 Nano Omni include Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir and Pyler, with Dell Technologies
The Nemotron 3 family — including Nano, Super and Ultra models — has seen over 50 million downloads in the past year
By building on Nemotron 3 Nano Omni, their agents can rapidly interpret full HD screen recordings — something that wasn’t practical
The model is.nvidia.com as an NVIDIA NIM microservice and through a broad ecosystem of NVIDIA Cloud Partners, inference platforms and cloud service providers

Summary

Unveiled today, NVIDIA Nemotron 3 Nano Omni is an open multimodal model that brings these capabilities together into one system, enabling agents to deliver faster, smarter responses with advanced reasoning across video, audio, image and text. Nemotron 3 Nano Omni sets a new efficiency frontier for open multimodal models with leading accuracy and low cost, topping six leaderboards for complex document intelligence, and video and audio understanding. AI and software companies already adopting Nemotron 3 Nano Omni include Aible, Applied Scientific Intelligence (ASI), Eka Care, Foxconn, H Company, Palantir and Pyler, with Dell Technologies, Docusign, Infosys, K-Dense, Lila, Oracle and Zefr evaluating the model. “To build useful agents, you can’t wait seconds for a model to interpret a screen,” said Gautier Cloix, CEO of H Company.