Gamifying AI Interpretability (for good)

56 days ago by CriaFaar

Share this post:

Share on Bluesky Share on Twitter Share on Facebook

Technical explanation part, for AI nerds and curious minds.

From Prompt to Planet: Interactive AI Interpretability in "Panspermia"

Made mostly in the space of a week, "Panspermia" isn't much of a game, but with time and love it could grow to become one. Mostly, just quietly; it's a real-time experiment in machine learning interpretability. The core mechanic that transforms a player's text into a destination star system is a direct application of techniques used to understand the inner workings of large language models like GPT-2.

Here's a simplified breakdown of how it works:

The Model: At the heart of our backend is a pre-trained GPT-2 model, managed using the transformer_lens library. GPT-2, like other Transformer models, is composed of layers, and each layer contains thousands of "neurons." These neurons work in concert to process and generate language.

Neuron Activations: When you input text (a "prompt") into the model, each neuron produces an activation value - a number that represents how strongly it responded to that specific input. A high activation suggests that the neuron is firing in response to a particular concept, pattern, or semantic feature it has learned to recognize.

Capturing the "Peak" Neuron: Our Python backend, specifically the experiment runner script, is designed to do one thing very efficiently:

It takes the player's prompt (e.g., "a forgotten melody").
It runs a "forward pass" through the GPT-2 model, calculating the activation values for every neuron in a specific layer (Layer 5, in this case).
It then identifies the single neuron with the highest activation value. This is the "peak activating neuron."

Note: The experimental code pre-dates the game jam, and was developed as part of my earlier efforts investigating how to "gamify" interpretability mechanics. In a previous attempted, documented on GitHub here, I tried a large sailing regatta, where GPT-2's mind was the ocean itself - it was far too complex to play without some decent in-depth understanding of LLM architectures, so I set it aside. I want to try build games that help build intution and understanding with very little pre-required knowledge, and Panspermia is definitely a lot closer to that.

Neuron ID as a Game Coordinate: The index of this peak neuron (an integer from 0 to 3071) is then sent back to the Three.js client. In the game's fiction, this ID becomes the name of a star system (e.g., Planet L5-N1888 for Layer 5, Neuron 1888).

Why I find this super interesting!

Researchers in the field of interpretability have found that individual neurons often correspond to specific, learnable concepts. For example:

One neuron might activate strongly for text related to parentheses or code blocks.
Another might fire for sentences with a positive, celebratory tone.
A third could be linked to fantasy or medieval themes.

By mapping these neurons to planets, we are, in a sense, giving players a way to "tour" the conceptual space of the AI. When a player types "an ocean planet overflowing with neon jellyfish," the game doesn't just generate a random ocean planet. It finds the neuron that functionally represents the concept of "oceanic neon jellyfish planet" (or something semantically similar) within the model's vast web of knowledge and sends the player there.

This creates a surprisingly intuitive and counter-intuitive deterministic mapping. Sometimes, similar prompts will lead to the same "neuronal" star systems, allowing players to build a mental model of the AI's internal landscape through pure exploration. Other times, there is a real difficulty in "interpreting" why a given neuron would fire for two very different inputs - and that's a beginning insight into the problems in interpretability. The "live mapping" script included in our files is a tool we used during development to batch-process large texts and pre-discover what kinds of concepts these neurons respond to, helping us build the game's underlying logic. This larger builds on months of prior work focused more on interpretability than games, with the goal of someday creating something just like this. It might be the world's first playable map of an artificial mind, thought it's not much to play, yet :3

"Panspermia" is my most promising ann exciting and frankly (for me) fun-to-explore attempt to turn a diagnostic tool for AI research into a form of play, inviting players to not just use AI, but to connect with it on a more fundamental level. A place where play itself is a form of care <3.

I highly recommend giving this diagram a look over especially the ideas for scalability that Gemini/GPT-5 offered up at the bottom left. These are good examples of how neural metrics can be adapted in various exciting ways to game mechanics, all at a fraction of the cost of deploying a single "modern" model, at a fraction of the (ecological) footprint too!

Panspermia

Explore the inner landscape of an artificial intelligence

Add Game To Collection

Status	In development
Author	CriaFaar
Tags	ai, interpretability, machine-learning

Next steps and post-jam reflections
49 days ago
Visualization of Latent Space
55 days ago
The reason for unreachable neurons, plus thoughts on scalability/citizen science
56 days ago

Panspermia

Gamifying AI Interpretability (for good)

From Prompt to Planet: Interactive AI Interpretability in "Panspermia"

Why I find this super interesting!

Panspermia

More posts

Leave a comment