Last week was my first live performance with AI audio. I'm calling the piece Agency of Chaos, Unmoved.
If, like me, you are a purist who likes to experience work with an untainted mind, then you might like to listen to the recording to form your own impression before reading my interpretation here.
At the core of this piece are four versions of the audio-generating AI model RAVE. RAVE is an auto-encoder: it ingests sound, encoding it into its own internal language. It then decodes its own language back into sound. To train it, I give it hours of audio, and it optimises the encoding and decoding process to work well for that audio. (Think of a person training to listen, remember and vocalise a sound. They learn what to listen for, what details to remember and how to recreate a sound from those details.)
I trained four versions of the RAVE model. One on a collection of lectures by Alan Watts. Another on every sound I've recorded, including the few seconds attached to each Live Photo taken on my iPhone. A third on all the music and sound art I've ever made. The fourth version is trained on a set of recordings of Adriana Minu, vocal performer and my wife. These were the first recordings of her experimental vocal practice after 10 years of not singing. In her words: “there is struggle and vulnerability in my early voice as I feel my way through new vocal territory”.
I played with combining these models together in new ways. If I feed the sound of Alan Watts through the model trained on his own voice, I get a slightly distorted version out. The distortion has an uncanny nature to my ears, less like analogue noise or digital glitch, and more like a skilful robotic imitator slipping up here and there. Next, I tried running the models simultaneously, and feeding the internal language encoded by one model into the decoder of a different model. The sound departs further. The dynamics and rhythm remain. The timbre is reminiscent but not quite there.
There was a magic moment where focus of the piece became clear. I was gradually degrading the quality of the Alan Watts model by modifying its internal representation between encoder and decoder. When I fed these encodings from the Alan Watts model directly into the decoder of Adriana’s model, something else emerged. These sounds were eerie. The struggle of the human combines with the uncanny valley of the AI. In the errors and distortions I hear the struggle of a living being: effort, intention, agency.
There's an ambiguity in where that struggle is rooted. Is it Adriana's struggle appropriated by the AI? Is it the AI models trying to get through as I reroute their internals into each other, Frankenstein style?
At the gig, I played using some MIDI controllers I borrowed from a friend. Having spent the past few years building AI-powered embodied gestural interfaces, it felt odd to perform with knobs and buttons. But this piece is a step on a bigger journey. Each new component brings its own character, whether that’s of sound, image, physicality or how those three connect. It makes sense to me to get a feel for that character by working with the models individually first.
Afterwards, someone told me she found it creepy, adding “but in a good way”. Another said she was sat next to her friend and said to her at the end that she felt empty; her friend said she felt completely filled. A third person said it was unlike anything she’d heard before. I found this last comment most validating, which is interesting because I complain about the cult of the new in the tech-art scene where novelty gets valued above depth and feeling. Performing with AI audio definitely runs the risk of being little more than a gimmick so I think the validation is in uncovering something that sounds new - at least to me, and someone in the audience.
Tim
Montreal, 2 June 2023