Nvidia · Hugging Face

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

Mon, May 18 · 4:00 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 source + 2 references discovered via search. See llms.txt for citation guidance.

★ Tier-1 Source

Sampson Error (lower is better). Both Temporal and Cross-view Sampson Errors decrease after fine-tuning, indicating improved temporal stability and multi-view geometric consistency.

NVIDIA Cosmos Predict 2.5 is a large-scale world model capable of generating physically plausible videos conditioned on text, images, or video clips.

Key facts

LoRA adapters are injected into the DiT's attention projections ( to_q, to_k, to_v, to_out.0 ) and feedforward layers ( ff.net.0.proj, ff.net.2 )
Conclusion: Training for 100 epochs (~2.5 hours on 8× H100s) is already sufficient to substantially improve all three metrics
The team use Cosmos Reason2 as an LLM judge, scoring each example from 1 to 5
The team use rank=32 as a starting point, resulting in ~50M trainable parameters

Summary

Training robot policies requires demonstration data, but collecting real-robot trajectories is slow and expensive. This makes it practical to fine-tune on a single GPU and flexibly swap adapters for different domains at inference. This guide walks through parameter-efficient fine-tuning of Cosmos Predict 2.5 with LoRA and DoRA, using the diffusers and accelerate libraries with support for both single- and multi-GPU training. Diffusers (pulls in transformers and peft automatically), accelerate. At minimum one 80 GB GPU for single-GPU training; 8× H100s recommended for faster iteration.

Read full article at Hugging Face →

#Nvidia

Full coverage

Other sources that covered this story, discovered via Google News. 2 unverified additional sources beyond our direct ingest — use these to verify claims, compare framings, or quote specific publications.

Tier 1 — direct ingest

Hugging Face May 18 · 16:00 UTC

NVIDIA Cosmos Predict 2.5 is a large-scale world model capable of generating physically plausible videos conditioned on text, images, or video clips.

Tier 3 — covered but not verified (2)

Intellectia.ai May 17 · 21:10 UTC

The artificial intelligence revolution has transformed the semiconductor industry into one of the most compelling investment opportunities of 2026.

The Daily Upside May 16 · 16:10 UTC

The list of companies creating technologies that could reduce the industry’s reliance on Nvidia might be longer than a shopping list for making a traditional mole poblano.