I started to build (with a ton of code assistance) a simulated environment with small creatures. They had tiny neural networks for brains and abstracted bodies - not literally physical but some sort of morphology. The idea was that the body and brain would co-evolve: as the morphology changed, the neural architecture would adapt to support it. Reinforcement learning rewarding them for hitting landmarks, genetic algorithms shaping them across generations.
When a creature hit a landmark it triggered a sound. The creatures had no sense of that sound - no acoustic perception, not even on a code level. They were just learning to navigate. But from my position as observer, watching the creatures converge on their targets across generations, the sounds they incidentally triggered formed something like a stochastic sequencer. By mapping velocity, rotation, and position values to simple synthesis parameters, I could hear these patterns not just see them. Deterministic fundamentally and over time rather obvious, eventually, because they converge . But strange and alive before that - and if you change target positions mid-run you get very uncontrolled rhythmic or arpeggiated motifs, at least until they start kamikazeeing all at once.
The interesting part: I am the evolutionary pressure. The reward function is my decision about what good behaviour looks like. Whatever the creatures evolve toward is a reflection of my values back at me, without me intending it. That's not just a technical constraint. It's a genuinely uncomfortable philosophical position to be in.
The rough roadmap from here: step one is done - train them to touch waypoints. Step two is using simple sounds to reveal the rhythmic patterns inherent in each generational pool of learned movements. The current implementation: waypoints are scale degrees (pure representation, not in the NN inputs - we only care about waypoint sequence as a goal to simplify learning and generation selection). Notes are restricted for now to chord notes decided by the degree. Simple envelope control for filter, volume, and reverb amount, with dampening based on entity density in a radius and velocity. Spatialization is also simple - stereo only, based on coordinates. After 300 generations you get something. In this iteration anyways. Step three is exploring increasingly complex mappings between behaviour, space, and sound properties until interesting structures appear. That last one is going to take the longest - right now they simply bleep bloop, pretty boring. But the goal is something that feels stochastic in process. A weird sequencer/synth that grows itself.
One aside: sonifying the weights and connections of the NNs directly is a possible direction but probably too obvious, and likely would just reproduce converging fields like an orchestra tuning -- not necessarily harmonically pleasant in any traditional sense, not that they have to be but also not necessarily more interesting...
Also thinking about what would it mean to simulate acoustic perception? To give these creatures something like hearing, and let them evolve it? The problem is that any model I build of what hearing is for would again be my choice. My description of what counts as useful listening would encode my assumptions about what sound is for. I can't surprise myself I think... Which raises the question of what listening is actually for? In the real world, in the evolutionary sense, before intelligence, before intention - ultimately just pattern matching that happened to keep something alive long enough to reproduce and pass it on. Usefulness of hearing is different than "listening" tho. Isn't it? And I don't know how to write a reward function for the difference without my own biases already deciding the potential 'musical' outcomes.
Found a research paper doing something similar afterwards, well in terms of evo-devo. Tho it was about the evolution of "eye". Either the algorithm knew what I was looking for, or I was just thinking about the same problems everyone else working in this space eventually arrives at.
If you'd like to experiment with RL yourself: Gymnasium
→ complexity and emergence · _in:Still Life · behavior as language