News · Alignment Forum

A "Lay" Introduction to "On the Complexity of Neural Computation in Superposition"

Wed, Apr 22 · 2:26 AM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

Image accompanies the article at Alignment Forum. No description was extracted from the source.

This is a writeup based on a lightning talk the reporter gave at an InkHaven hosted by Georgia Ray, where they were supposed to read a paper in about an hour, and then present what they learned to other participants.

Key facts

This led to a series of research projects in 2022 studying what they would nowadays call representational superposition
As far as the reporter knows, this is a new construction (or at least was so at the time) that seems useful for hand-constructing neural networks for similar problemsFigure 2 from Adler and Shavit 2024
The paper also contains parts which present a non-trivial theoretical CS result, and says that it’s true because of citation 38
Then the reporter would click on 38 and it’d say “personal communication with another MIT professor

Summary

The reporter foolishly thought the reporter could read a theoretical machine learning paper in an hour because it was in their area of expertise. Back in the olden days (2021) there was a dream that you could open up a neural network and understand it by looking at individual neurons. Then you could check if the ‘betray all humans neuron’ is on. For one thing, a serious issue was neuron polysemanticity, where a neuron fired on a bunch of seemingly unrelated things. They'd see things like the ‘betray all humans’ neuron firing on discussion of cats and the like. And this is what neural networks are doing.

In fact, it’s probably important to think of neural networks as computing things, given that that is what the interesting parts of the network are doing. As a general rule, neural networks are not representing concepts that God handed to it in its input. In 2024, the reporter did some work in this area (though their collaborators deserve more of the credit). They're careful with their math and constructions in a way that the reporter thinks the work the reporter was involved in was not. That is, they show that for some classes of toy problems with m pure concepts, you need a network that has at least sqrt(m/log(m)) number of neurons.

Read full article at Alignment Forum →