← Back to KHAO

Tech ·

Releasing these models should help the community to study these questions and build toward modular language models that are easier

2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

★ Tier-1 Source

EMO blog post draft ryan - Google Docs-image-1 (1)

To deploy, adapt, inspect, and compose.

Key facts

Summary

Large language models are typically trained and deployed as monolithic systems: a single model is initialized, pretrained, fine-tuned, and served as one unified entity. Mixture-of-experts (MoE) models seem like a natural way to relax this constraint. In practice, however, existing MoEs still need the full model to work well. The team instead want MoE models whose experts organize into coherent groups that can be selectively used and composed. One way to encourage this during pretraining is to route tokens to experts based on predefined semantic domains, such as math, biology, or code. More importantly, fixing the domains upfront also fixes the model's modular structure: if a new domain or capability emerges at inference time, it isn't obvious which experts should be used.

Read full article at Hugging Face →