Interpretability · MIT Technology Review

This outfit’s new mechanistic interpretability system lets you debug LLMs

Thu, Apr 30 · 3:59 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

hand with pliers poking at a belt attached to a complicated mess of valves and switches.

The San Francisco–based startup Goodfire released a new tool, called Silico, that lets researchers and engineers peer inside an AI model and adjust its parameters—the settings that determine a model’s behavior —during training.

Key facts

Looking inside a model to see what’s going on might reveal that it is being influenced by neurons associated with the Bible, in which verse 9.9 comes before 9.11, or by code repositories
For example, many models will tell you that 9.11 is greater than 9.9
In another example, Goodfire researchers asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases, affecting 200 million users
According to Stanford’s 2026 AI Index, AI is sprinting, and they're struggling to keep up

Summary

Goodfire claims Silico is the first off-the-shelf tool of its kind that can help developers debug all stages of the development process, from building a data set to training a model. The company says its mission is to make building AI models less like alchemy and more like a science. “We saw this widening gap between how well models were understood and how widely they were being deployed,” Goodfire’s CEO, Eric Ho, tells MIT Technology Review in an exclusive chat ahead of Silico’s release.

Read full article at MIT Technology Review →

#interpretability #mechanistic