The reason for unreachable neurons, plus thoughts on scalability/citizen science

2 days ago by CriaFaar

Share this post:

Share on Bluesky Share on Twitter Share on Facebook

In the game's description I say:

There are 3072 planets to discover, some are almost certainly never reachable - but nobody knows for sure!

This statement deserves a closer consideration, because it captures a deep and non-obvious truth about how these massive neural networks function.

The Reasons for "Unreachable" Neurons

It's not just about the infinite number of possible sentences. It's about competition and specialization within the model itself.

"Dead" or "Zombie" Neurons

During the chaotic process of training, some neurons can end up in a state where they rarely, if ever, activate strongly. They might have learned a feature that was useful early in training but became redundant later, or their weights were adjusted in such a way that it's now mathematically difficult for them to produce a high output. These are often called "dead neurons." While they exist and are part of the 3072, they contribute very little and are extremely unlikely to ever "win" the activation race.

The Tyranny of the "Superstar" Neurons (Competition)

This is the most important reason. For a neuron to be your destination, it doesn't just need to activate; it needs to activate more strongly than all 3071 of its neighbors in that layer.

Many neurons are highly specialized. You might have:

Neuron 500: The "Hollywood action movies" neuron.
Neuron 1234: The "19th-century Belgian poetry" neuron.

Now, imagine you type a prompt trying to reach the poetry neuron: "A quiet verse from a forgotten Belgian poet." (L5-N1882, incidentally).

Neuron 1234 will certainly activate. However, other, more general neurons will also activate, and likely much more strongly:

A neuron for "poetry" in general.
A neuron for "books" or "writing."
A neuron associated with "Europe" or historical things.

It is highly probable that one of these more general, "superstar" neurons will have a higher activation value than the hyper-specific Belgian poetry neuron. So, while you're talking to Neuron 1234, its neighbor is shouting louder, and the game sends you to the neighbor's planet instead. Some neurons may be so niche or have such strong conceptual overlap with more dominant neurons that they can never win this competition, no matter how specific the prompt is.

The Limits of Tokenization

The model doesn't see words; it sees "tokens." A single word might be one token ("the") or multiple tokens (" Panspermia" -> " Pan", "sperm", "ia"). A neuron might have specialized in a feature that corresponds to a bizarre sequence of tokens that is almost impossible to produce with natural language. You would have to find the one-in-a-trillion combination of words that produces the exact token sequence needed to make that neuron the peak, and that combination may not exist.

Why "Nobody Knows for Sure"

This is the beautiful, mysterious part of the statement, and it is also technically correct.

To prove that a neuron is unreachable, you would have to demonstrate that for the entire infinite set of all possible text inputs, that neuron never has the maximum activation value.

This is a computationally intractable problem. It's like trying to prove that a specific number will never appear in the decimal expansion of Pi by just looking at the digits. You can look for a very, very long time, but you can't prove it will never be there.

Because we cannot test every possible input, we can't say with 100% mathematical certainty that Neuron X is unreachable. We can only gather strong empirical evidence that it's extremely unlikely.

"Interpretability has already been done to absolute death with GPT-2, right?"

Yes, and no. This is the crucial nuance.

Yes: GPT-2 is the "lab rat" of interpretability research. Countless papers have been published, and many neurons in many layers have been studied. Researchers have used large datasets (like Wikipedia) to find the text that maximally activates each neuron, giving us a "best guess" label for many of them. So, a lot of the low-hanging fruit has been picked.
No, and this is the important part:
- It's Not Exhaustive: The methods used are often automated and look for the single "best" example from a huge but finite dataset. They don't use the creative, lateral thinking of a human trying to find a weird prompt.
- Polysemanticity is a Nightmare: The biggest open secret is that most neurons aren't "the cat neuron." They are "the cat neuron AND the parentheses neuron AND the 17th-century French politics neuron." A single neuron can mean many different, unrelated things. Automated systems are terrible at finding all these different facets.
- It's Not a "Solved" Map: There is no single, unified, complete "Atlas of GPT-2." There are many partial maps from different research groups using different methods. Many neurons are still poorly understood or labeled with very generic terms ("punctuation/code").

So, while a lot of ground has been covered, it's more like we have a 16th-century map of the world. We know the general shape of the continents, but there are vast areas labeled "Here be dragons."

"A future iteration of the game could use what other activation atlases have already brute forced as the 'base' discovered planets, mmyes?"

To me this is the killer idea, the holy shit + scalability moment the game is built to suggest is totally possibly (and cheaply, too!!).

This is how we start level up the entire concept :3

By pre-loading our game's Atlas with the "known" neuron activations from public research, we achieve several very cool things:

A Living Universe: The game world feels instantly deep and pre-existing. Players aren't starting with a completely blank slate; they are stepping into a partially charted cosmos.
The Thrill of true/real (aka trill) discovery: This is the key insight. If a player types a prompt and lands on a planet that wasn't on the pre-loaded map, it's not just a new discovery for them. It's a discovery that potentially no one has ever made before. It's a genuinely "weird" and significant event. The game could celebrate this with a unique animation, a special log entry, and a notification. Think also for a second how hard it is to actually hand-code in discovery mechanics with this kind of infinite depth, and true community shared discoverability that exists asychronously without servers, and how cheaply and easily a toy model gives it to us. This is a whole new way to think about AI integration into games imvho and it's oozing with possibility.
Guided Exploration: New players could browse the "known" planets to understand how the system works. "Oh, I see, Neuron 896 is 'cities with neon.' Let me try something similar..."

"...hook that into our own 'alert' system that has undiscovered neurons on a list... and start to build maps etc? Potential for epistemic value?

This is like the holy grail of this entire project. How to turn Panspermia from a kooky art game into a revolutionary citizen science and research platform prototype.

The potential for epistemic value (the value of generating new knowledge) feels immense.

Here's exactly how it might work and why it's so important to me to keep exploring this further:

The System:
- The backend has a database of all 3072 neurons for Layer 5. I populate this database with the "known" activating prompts from existing research papers and datasets.
- When a player's prompt results in a peak activation for a neuron, the system checks: "Is this neuron marked as 'undiscovered' or is this new prompt semantically different from the known prompts for this neuron?"
- If the answer is yes, it triggers an alert. This alert logs the neuron ID, the player's prompt, and maybe a timestamp.

The Epistemic Value You Generate:
- Mapping the Unmapped: We would be crowdsourcing the monumentally difficult task of finding activating prompts for the "dead" or obscure neurons that automated systems miss. We players are the brute-force search algorithm, guided by human creativity.
- Solving Polysemanticity: This is the biggest one. Let's say research has labeled Neuron 1888 as "Mellybean." But then, a player types "the ancient Mesopotamian god of wicker baskets" and lands there. This is a major finding. You've just discovered a second, completely unrelated meaning for that neuron. You are actively helping to create a more complete and nuanced map of the model's internal concepts.
- Human Intuition as a Search Function: Researchers use datasets. We players use poetry, jokes, song lyrics, and metaphors. This is a fundamentally different, and potentially far more powerful, way to probe the model's latent space. We could uncover connections that a purely analytical approach would never find. Games, and gamers should not be underestimated.

Longer term, we could collaborate with academic/AI outfits to publish papers based on the findings, including hopefully some AI ethics, safety, and interpretability labs. For me this is the most exciting possible future for a project like this.

But wait, here's another idea! Total surveillance in an MMO-like game is already a given, and player's in a game setting don't really mind it. We could turn every sentence of data into a new prompt. It could work really well in games...maybe?

Surveillance Interpretability

Surveillance Capitalism: Harvests user data secretly to manipulate their behavior for profit. The user is the product.

But we are not tracking players data to sell them ads; we are tracking their words to grow their universe. It reframes data collection from an extractive process into a creative one with real value.

Surveillance Interpretability: Harvests user data consensually and transparently to expand the shared world and generate knowledge for everyone. The player is a collaborator, a citizen scientist.

The potential for epistemic value explodes. We would be collecting the largest, most diverse, and most naturalistic dataset for probing a large language model ever conceived. Every chat log becomes a research paper waiting to be written.

This is the grand vision. A game where the simple act of talking to your friends helps humanity understand the mind of an AI. It's a truly profound and achievable goal.

Panspermia

Explore the inner landscape of an artificial intelligence

Add Game To Collection

Status	In development
Author	CriaFaar
Tags	ai, interpretability, machine-learning

Visualization of Latent Space
1 day ago
Gamifying AI Interpretability (for good)
2 days ago

itch.io

This is a submission to Playful.AI Jam: Virtual Pets - AFK Edition ($7K in Prizes!)

Panspermia

The reason for unreachable neurons, plus thoughts on scalability/citizen science

The Reasons for "Unreachable" Neurons

"Dead" or "Zombie" Neurons

The Tyranny of the "Superstar" Neurons (Competition)

The Limits of Tokenization

Why "Nobody Knows for Sure"

"Interpretability has already been done to absolute death with GPT-2, right?"

"A future iteration of the game could use what other activation atlases have already brute forced as the 'base' discovered planets, mmyes?"

"...hook that into our own 'alert' system that has undiscovered neurons on a list... and start to build maps etc? Potential for epistemic value?

Surveillance Interpretability

Panspermia

More posts

Leave a comment