#practice-research #sound-art #machine-learning #alvin-lucier #llm #transformers #audio-ml #mechanistic-interpretability

LUCIER'S ROOM

Research Notes

Living document. Written first for myself, then for anyone curious. Updated as the work changes.
LLM "assisted" formatting, checked for mistakes and errors as humanly possible.

The original

Alvin Lucier's I Am Sitting in a Room (1969). A voice is recorded, played back into a room, re-recorded, repeated 32 times. The room's resonant frequencies destroy the speech and replace it with themselves. The room is passive. It doesn't interpret. It just has a geometry.

What is the LLM analog of this?

The problem with the metaphor/analogy

A physical room is often modeled as a fixed filter - but that's an approximation that holds only under stable conditions. Fixed source, fixed receiver, stable temperature, no movement. In practice room acoustics vary with all of these.

What Lucier's piece actually relies on is that the room is passive. No goal. No interpretation. It reshapes whatever passes through it consistently enough that the resonant frequencies accumulate across 32 iterations.

An LLM is not passive. Its weights are fixed within a single run - same parameters, same architecture. But what those weights compute depends entirely on what you feed in. The same model, different input, completely different computational path through its layers. The room changes shape depending on what you say to it. And if you change models - different training run, different size, different fine-tune - you have a different room altogether.

Two projects

The first treats the model as a room with a worldview. Same text fed in every iteration as a constant input. The model's own previous outputs accumulate as context. Over 32 iterations the response drifts. What the model amplifies, what it drops, what it converges toward - that's the room's resonant signature.

Even with no instructions, no system prompt, the model cannot receive text without "interpreting" it. Lucier's room had no semantics. This room has absorbed the entirety of its training data and has opinions.

The second goes inside the model. Instead of capturing what the model outputs as text, it captures what happens in the model's internal mathematics at each iteration - residual stream activations, attention patterns, logit distributions, how the representation of the original text drifts through the model's geometry over 32 passes.

What going inside means

A language model processes text by converting words into vectors - coordinates in a high-dimensional space. As these vectors pass through the model's layers, they are modified by attention mechanisms and feed-forward networks. The residual stream acts as a running total, accumulating these modifications layer by layer until the final layer translates the vectors into a probability distribution over the model's vocabulary.

Instead of just looking at the generated text, this project captures the internal mechanics of this process across 32 iterations:

The Vector State: A snapshot of the residual stream (hook_resid_post) at various layer depths to see exactly how the internal representation of the text changes.
The Drift Metric: Cosine distance to measure how far the model's internal state has drifted from the initial seed text. A distance of 0 means the representation is identical; a distance approaching 1 means the model is processing something geometrically unrelated to where it started.
The Final Output: The logit distribution at the final layer, mapping how the model's vocabulary choices narrow down as it converges toward an "attractor".

The models as rooms

Different models produce different rooms. This turned out to matter more than expected.

GPT-2 was trained on web text from 2019 - Reddit links, primarily. Its resonant frequencies are conversational, forum-like. It collapses toward a particular kind of internet register fast.

Pythia is more distributed. Smaller models converge to attractors faster; larger ones take longer.

Qwen2.5 Instruct models are fine-tuned to be helpful assistants. Even without explicit instructions they drift toward response-like outputs.

Loop modes

Five different feedback structures, each producing mechanistically different activation patterns:

chain - output of iteration 1 becomes input to iteration 2. Seed appears once and never again. Direct Lucier structure.

accumulate - seed kept at the start of every iteration, outputs accumulate as context. The seed never disappears but gets buried.

attention-morph - words in the input are selectively replaced by words from the output, based on which words the model attended to most strongly. A threshold rises over iterations. The model is editing its own input.

instruct-compress - generate a long output (600 tokens), then apply an instruction: summarize in one paragraph. On instruct-tuned models this works - and importantly, the chat template is applied correctly so the model receives the instruction in the format it was trained on. On base models (GPT-2, Pythia) the instruction has no effect - the model treats it as more text to continue from.

kv-compress - works directly on the model's internal state. Captures the key-value cache, prunes it by attention weight, injects the pruned version back in.

The audio expansion - three paths

A voice recording is fed in alongside the text. At each of the 32 iterations the audio is transformed using information derived from the model's internal state.

How do you turn a language model's internal geometry into an acoustic filter?

Path 1 derives a single impulse response from the model's weight matrices before any text is processed. Computed once, applied identically across all 32 iterations. Closest structural analog to Lucier. One room, one fixed signature.

Path 2 derives a new impulse response at every iteration from the model's activation state as it processes that iteration's text. Because the model's computation genuinely changes with each input, these 32 IRs are 32 different rooms. The drift in the IR is the content.

Path 3 maps the model's probability distribution over its vocabulary to a spectral envelope - a shaping of the audio's frequency content. As the model converges toward its attractor vocabulary the distribution narrows. The voice is progressively constrained by the model's statistical certainty.

All All three run simultaneously on the same model pass, producing three parallel audio chains from the same 32 iterations.

Voice input

Upload a WAV file, or use F5-TTS voice cloning - upload a short reference recording, the system clones the voice and synthesizes the model's text output in it at each iteration. The model generates text; the text is synthesized as speech in the voice of the person who started the process; that audio is convolved with an IR derived from the model's own geometry.

The voice is a variable in a way it isn't in Lucier. In Lucier it is his voice, in his room. Here it can be a recording made in a specific room (adding a second acoustic layer before the model touches it), or a synthesized voice driven by the model's own output.

Why these four hooks

hook_resid_post is the right primary probe - the residual stream after both attention and MLP have contributed. The most information-dense single read point per layer. It's what produces the drift curves, the per-layer fan lines, and the seed token heatmap.

hook_pattern gives the attention distribution at the final token position - which tokens the model is attending to when predicting what comes next. This drives the attention-morph mode (words the model attends to strongly survive; others are replaced) and the attention-to-seed metric.

hook_k and hook_v are there for a specific instrumental reason: the kv-compress loop mode needs them. It captures the full KV cache, prunes positions by attention weight, and re-injects the pruned state.

hook_q is not hooked because nothing in the app needs to intercept it. hook_pattern already gives the result of the Q-K dot products after softmax - the attention distribution is visible without capturing the raw queries.

Trajectory mode

In standard mode, activations are captured once per iteration on the full context. In trajectory mode, the residual stream is also captured at each token step during generation - one measurement per generated token, not just per iteration.

This gives a much higher-resolution picture of how the model's internal state moves as it generates each word. The per-iteration measurement shows the room's signature after each full pass. Trajectory mode shows the moment-to-moment drift within a single generation step.

It's slower - one extra forward pass per token - and only available on modes that use streaming generation.

Audio technical specifics

Format: WAV, 44100Hz, mono, 32-bit float throughout.

Input: Any WAV (mono or stereo, any sample rate), converted internally to 44100Hz mono on load.

IR length: 4096 samples = 93ms at 44100Hz. All kernels are padded or trimmed to this length before convolution. For mean-residual-direct on small models (d_model=768, e.g. pythia-160m and gpt2), the kernel is 768 samples and zero-padded to 4096.

Convolution: scipy.signal.fftconvolve(audio, kernel, mode='full'), output trimmed to input length. Path 3 uses overlap-add STFT rather than direct convolution - the spectral envelope is applied in the frequency domain.

Generation defaults: temperature 0.8, max_new_tokens defaults to seed token length (~115 tokens depending on model tokenizer).

What the interface shows

Both LUCIER'S ROOM and Lucier Expanded use the same three-column layout: controls left, text drift center, visualizations right. Four panels on the right, all reading from the same activation snapshot:

Hidden state drift - the primary signal. Solid line: cosine distance of the mean residual stream from the seed reference. Fan lines: per-layer drift. Dashed line: fraction of attention directed at seed token positions. In trajectory mode, purple dots show per-token drift during generation.
IR waveform / envelope - time-domain display of the current IR kernel (paths 1 and 2) or spectral envelope (path 3). Path 1 is static across iterations; path 2 updates every iteration.
Seed token drift - heatmap. Each row is one token position in the seed. Column is iteration. Brightness encodes how far that position's representation has drifted.
Logit distribution - top-k token probability heatmap. Rows are tokens, columns are iterations. Tokens stabilizing across iterations are the model's attractor vocabulary.

When audio from a completed run plays back, the corresponding words in the seed text highlight in real time, synchronized to the audio duration. The body and the text run on the same timeline.

A score view (View C) shows all 32 iterations as a table: full output text, cosine distance from seed, top predicted tokens.

What the signals actually tell you

Three of the four captured signals I track:

Residual stream drift - direct measurement of how the model's internal representation of the original text changes as context accumulates. If the curve plateaus, the model has found an attractor. Whether that's meaningful or a repetition artifact of a small fine-tuned model is not always clear from the curve alone.

Logit distribution - the model's probability mass over its vocabulary. Watching it narrow over 32 iterations is watching the room's resonant frequencies assert themselves.

Seed token drift - how much attention the model pays to specific positions in the original text across iterations. Which words survive; which decay.

KV cache - I'm less certain. It's a computational artifact. Drift in KV cache state tells you something about how the model's processing has changed but what exactly is harder to interpret cleanly.

Loose Ends

A model cannot receive text without doing something semantic with it. Lucier's room had no semantics. Nots sure whether that makes this work a broken version of the same idea or an honest description of a different kind of space.

What is the voice? In Lucier it's his voice, in his room. Here it can be a recording made in a specific room, which adds a second acoustic layer before the model touches it. Or a voice synthesized from a reference, driven by the model's own output.

When is a plateau an attractor and when is it a 'malfunction'? Small instruct models sometimes lock into repetitive output on the first iteration. The measurements don't clearly distinguish a genuine attractor from a fine-tuning artifact. I might need to look for better tools for this.
Also, from what I have read so far, a flat plateau in an instruct-tuned model (e.g., Qwen 2.5) isn't a complex physical attractor but a possibly a safety-alignment or RLHF sink. Instruct models are trained to limit their output variance to stay helpful and deterministic. Base models on the other hand (e.g., Pythia, GPT-2) could have true chaotic structural attractors because they lack these constraints.

References

Lucier, Alvin. I Am Sitting in a Room. 1969. Performed at the Guggenheim Museum, New York.
Nanda, Neel et al. TransformerLens. https://github.com/TransformerLensOrg/TransformerLens
Biderman, Stella et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. 2022. https://arxiv.org/abs/2101.00027
EleutherAI. Pythia: A Suite for Analyzing Large Language Models. 2023. https://arxiv.org/abs/2304.01373

Last updated: 2026-05-30. Ongoing.

→luciers-room