GPT · OpenAI · Nvidia · Hugging Face

Five labs, five minds: building a multi-model finance drama on small models

Sat, Jun 6 · 7:02 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.

★ Tier-1 Source

The first version of Thousand Token Wood was a weather-god sandbox: five woodland creatures on one fine-tuned 0.5B model traded goods, and you poked the world with shocks and watched bubbles and crashes emerge.

Key facts

The first version of Thousand Token Wood was a weather-god sandbox: five woodland creatures on one fine-tuned 0.5B model traded goods, and you poked the world with shocks and watched bubbles
The obvious way to run a council of agents is one model, many prompts. v2 runs four: gpt-oss-20b (OpenAI), MiniCPM3-4B (OpenBMB), Nemotron-Mini-4B (NVIDIA), and a fine-tuned Qwen 0.5B of their own
And the biggest change is under the hood: every creature now thinks with a different lab's small model
Standing four distinct models up on one platform surfaced the real lesson: the friction is almost entirely at the serving layer, not the modeling layer

Summary

V2 rebuilt it into a game you operate. And the biggest change is under the hood: every creature now thinks with a different lab's small model. The obvious way to run a council of agents is one model, many prompts. v2 runs four: gpt-oss-20b (OpenAI), MiniCPM3-4B (OpenBMB), Nemotron-Mini-4B (NVIDIA), and a fine-tuned Qwen 0.5B of their own. A market is interesting when the participants genuinely differ, and four labs' models trained on different data with different post-training are about as different as small models get. Standing four distinct models up on one platform surfaced the real lesson: the friction is almost entirely at the serving layer, not the modeling layer. Current vLLM (0.22.1) JIT-compiles kernels at load and needs the CUDA toolkit ( nvcc ) present.

Read full article at Hugging Face →

#GPT #OpenAI #Nvidia