Interpretability · MIT Technology Review
This outfit’s new mechanistic interpretability system lets you debug LLMs
Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.
◌ Single Source
The San Francisco–based startup Goodfire released a new tool, called Silico, that lets researchers and engineers peer inside an AI model and adjust its parameters—the settings that determine a model’s behavior —during training.
Key facts
- Looking inside a model to see what’s going on might reveal that it is being influenced by neurons associated with the Bible, in which verse 9.9 comes before 9.11, or by code repositories
- For example, many models will tell you that 9.11 is greater than 9.9
- In another example, Goodfire researchers asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases, affecting 200 million users
- According to Stanford’s 2026 AI Index, AI is sprinting, and they're struggling to keep up
Summary
Goodfire claims Silico is the first off-the-shelf tool of its kind that can help developers debug all stages of the development process, from building a data set to training a model. The company says its mission is to make building AI models less like alchemy and more like a science. “We saw this widening gap between how well models were understood and how widely they were being deployed,” Goodfire’s CEO, Eric Ho, tells MIT Technology Review in an exclusive chat ahead of Silico’s release.