Jonathan Reus

The politics and possibilities at the edges of voice AI

DADAsets is a response to the cultural and economic ecosystem of voice AI and voice data that is rapidly terraforming the meaning and function of voice. The project involves researching the development of open digital music tools and bespoke voice datasets that challenge the popular narratives and agendas around voice AI – which focus on the spectacle and fears around the technological results, such as digital clones that perfectly reproduce the voice of a famous narrator or pop singer. DADAsets aims to create work that playfully and artistically foregrounds less visible labor and relationships around voice AI. The dominant narrative of Generative AI is often focused on the shock and awe of its visible results. The goal of DADAsets is to decenter the spectacle and instead foreground AI’s total dependence on a complex ecosystem of training data, often obscured and involving huge amounts of hidden labor.

The project will involve a series of public workshops to map out communities of voice AI and voice data, and create a proof-of-concept public dataset (DADAset) as an archetype for future diverse and ethically sourced voice datasets, especially those that fall outside of the mainstream economies of voice data and music AI. These DADAsets aim to be carefully crafted in collaboration with vocal artists who fall outside of these cultural and economic value systems, such as experimental vocal artists who have developed a particularly unique craft and vocal communities across different traditions and cultures, and will be released under a speculative fair use license. For the first DADAset I am fortunate to be collaborating with Jaap Blonk, a renowned Dutch sound poet and performer of dadaist sound poetry.

In order to experience DADAsets, we will create an AI voice synthesis instrument “Tungnaa” (named after the Icelandic “River of Tongues”), an open software instrument meant to be trained on DADAsets. We will strive to make Tungnaa a hackable, fun and playful tool that allows artists to explore the unique aesthetics of neural network generated audio. Tungnaa should be able to run on a modest laptop without the need for high-end GPU computing resources, and without its underlying technology being hidden behind a web-based gatekeeping portal or paid service like most voice AI tools today.

Inspired by live coding and the typographical experiments of the postwar Dada art movement, Tungnaa also invites artists to invent their own text-based vocal notation systems, and to explore possible vocal notation systems for all the possible things a human voice can do – including those that exist beyond the narrow focus of conventional language or singing. 

The project is developed through a collaboration with the core AIR hub PINA, in Koper, and with composer Mauricio Valdes, who runs PINA’s spatial sound lab HEKA. Once Tungnaa is up and running, our aim is to start experimenting with immersive compositional approaches for artificial voice. Also, combining my background in new digital musical instruments and Mauricio’s expertise in immersive audio, we have identified many challenges in making sophisticated spatial audio technology accessible and engaging from the embodied perspective of a musician. We have discussed many new gestural approaches to immersive audio composition, aimed at making spatial audio more artist-friendly, and together we are aiming to write a Manifesto for Immersive Sound, which will sketch out a way forward for bridging the accessibility gap so that more musicians of varying technical skills can explore this medium.

Over the course of AIR, DADAsets will involve public presentations and workshops that will reflect on the research, fostering a dialogue on the social/digital/economic ecologies around voice data/AI with diverse communities.

Overall, DADAsets is poised to make significant contributions to the field of voice data/AI by challenging popular narratives, promoting a values-first approach to technology creation.

Mauricio Valdes point of view

In early February, Jonathan visited PINA to explore the spatial audio studio and discuss the accessibility issues in immersive audio with Mauricio. They identified several challenges, including the absence of intuitive software tools and the need for a distinct compositional vocabulary for spatial sound. This initial brainstorming session has since evolved into ongoing discussions and collaborations.

Subsequently, Jonathan engaged with Uwe and Leyla from HLRS to secure supercomputing data for sonification experiments, that will be later use as a possible use in the Spatial Audio aspect of the final piece, either by sonification means, or spatialisation of the sounds. Leyla provided sample datasets from the HAWK supercomputer—focusing on temperature, power consumption, and communication metrics. Uwe proposed an in-person session to experiment with real-time sonification of these datasets. Additionally, I consulted Nico Formanek, an HLRS philosopher, about organizing a discussion on the philosophy of voice and computing.

Jonathan’s interactions at the BSC were facilitated by Fernando, who introduced Sergio and Jofre, two researchers with strong technical/programming skills. Together, we conceptualized tools for real-time complex data sonification. Although initial attempts to find data at BSC were unproductive, and Sergio and Jofre eventually left the project, I remained involved in discussions with them about AI voice synthesis, linked to Maria’s project.  

The conversations have revolved around sonifying life science simulations from their research, and acquiring runtime data from the decommissioned MareNostrum4 for sonification purposes, with Thalia contributing significant insights due to her familiarity with the infrastructure.

Parallel to these collaborative efforts, Jonathan has been developing a real-time voice synthesis tool for “extended” vocal capabilities with software developer Victor Shepardson. Additionally, Jonathan made a research trip to IRCAM—a leading institution in music technology—to consult with researchers and better inform the development of a real- time extended voice AI instrument, and combine them with the research we have about the Spatial Audio aspect of the project.