GPT · OpenAI · Nvidia · Hugging Face
Five labs, five minds: building a multi-model finance drama on small models
Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.
★ Tier-1 Source
The first version of Thousand Token Wood was a weather-god sandbox: five woodland creatures on one fine-tuned 0.5B model traded goods, and you poked the world with shocks and watched bubbles and crashes emerge.
Key facts
- The first version of Thousand Token Wood was a weather-god sandbox: five woodland creatures on one fine-tuned 0.5B model traded goods, and you poked the world with shocks and watched bubbles
- The obvious way to run a council of agents is one model, many prompts. v2 runs four: gpt-oss-20b (OpenAI), MiniCPM3-4B (OpenBMB), Nemotron-Mini-4B (NVIDIA), and a fine-tuned Qwen 0.5B of their own
- And the biggest change is under the hood: every creature now thinks with a different lab's small model
- Standing four distinct models up on one platform surfaced the real lesson: the friction is almost entirely at the serving layer, not the modeling layer
Summary
V2 rebuilt it into a game you operate. And the biggest change is under the hood: every creature now thinks with a different lab's small model. The obvious way to run a council of agents is one model, many prompts. v2 runs four: gpt-oss-20b (OpenAI), MiniCPM3-4B (OpenBMB), Nemotron-Mini-4B (NVIDIA), and a fine-tuned Qwen 0.5B of their own. A market is interesting when the participants genuinely differ, and four labs' models trained on different data with different post-training are about as different as small models get. Standing four distinct models up on one platform surfaced the real lesson: the friction is almost entirely at the serving layer, not the modeling layer. Current vLLM (0.22.1) JIT-compiles kernels at load and needs the CUDA toolkit ( nvcc ) present.