← Back to KHAO

News ·

A "Lay" Introduction to "On the Complexity of Neural Computation in Superposition"

2 min read

Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.

◌ Single Source

Image accompanies the article at Alignment Forum. No description was extracted from the source.

This is a writeup based on a lightning talk the reporter gave at an InkHaven hosted by Georgia Ray, where they were supposed to read a paper in about an hour, and then present what they learned to other participants.

Key facts

Summary

The reporter foolishly thought the reporter could read a theoretical machine learning paper in an hour because it was in their area of expertise. Back in the olden days (2021) there was a dream that you could open up a neural network and understand it by looking at individual neurons. Then you could check if the ‘betray all humans neuron’ is on. For one thing, a serious issue was neuron polysemanticity, where a neuron fired on a bunch of seemingly unrelated things. They'd see things like the ‘betray all humans’ neuron firing on discussion of cats and the like. And this is what neural networks are doing.

In fact, it’s probably important to think of neural networks as computing things, given that that is what the interesting parts of the network are doing. As a general rule, neural networks are not representing concepts that God handed to it in its input. In 2024, the reporter did some work in this area (though their collaborators deserve more of the credit). They're careful with their math and constructions in a way that the reporter thinks the work the reporter was involved in was not. That is, they show that for some classes of toy problems with m pure concepts, you need a network that has at least sqrt(m/log(m)) number of neurons.

Read full article at Alignment Forum →