Realtime Conjuring for Live Performance
The role that audiovisual Generative AI can play in enhancing human creativity in an ethical way using converged media production
We catch up with Hazel Dixon from our CoSTAR Prototyping team to find out about their recent investigation into the role Generative AI can play in live performance.
What area of research did this project explore and why is it of importance to the Creative Industries?
Realtime conjuring for live performance explored the role that audio and visual generative AI tools can play in a converged media production and distribution pipeline.
Crucial to the CoSTAR National Lab’s remit for R&D in the creative industries is the Lab’s responsibility to explore and signpost ethical ways in which GenAI can be used to enhance human creativity, support new business models and stimulate growth. In this blog, we explain how we put creative ideation at the heart of exploring technical R&D questions that are critical to the sector and inform how we are designing the National Lab’s infrastructure at Pinewood. In this project we explore:
How can we integrate generative AI tools, techniques & workflows to support Advanced Production in a physical studio to enable realtime control of lights, LED volume, audio and in-camera VFX in response to an improvised live performance?
How can we build converged media production pipelines that enable (1) and are orientated from the outset towards digital distribution of live performance across streaming and metaverse platforms to widen the audience reach?
As part of (2), how do we enable audience access and interaction, including with the generative AI process, at a technology, cost and access level that is democratised?
What was the specific opportunity you were exploring?
We were particularly excited by the developments around Nitro Fusion: a fast-rendering image generation tool developed by the National Lab’s Yi-Zhe Song and Dar-Yen Chen at the University of Surrey. Nitro Fusion’s ability to generate images in near real-time and to stylise those outputs using LoRA training gave us an exciting opportunity for creative integration into live performance - offering a distinctive visual identity that could maintain artistic integrity. Tapping into best of class research into generative AI as well as creative, production and development expertise allowed us to set some evaluation criteria by which we could assess how effectively we were developing this converged media production. We settled on: immediacy, world integrity and interactivity.
Immediacy – the ability to generate imagery at the speed of thought, allowing the imagery to respond to the performance.
World integrity – how artists could retain creative control over the story world, keeping consistent style and control of imagery.
Interactivity – how audiences and creatives can co-create with systems
These criteria would help to guide us on how effectively we were integrating technologies and creative tools into our pipeline.
We were also keen to explore how we could utilise our National Lab infrastructure and research expertise to explore this problem. By utilising our Futures Studio, LED volume and Disguise capabilities as well as research expertise, how could we use the knowledge at CoSTAR National Lab within our partnerships to be able to create a prototype that was able to demonstrate some of the capabilities of the tools and technologies and start a conversation with industry partners?
How did you start?
The project began with an internal ideation session – presenting the tools and vision to researchers and internal stakeholders at CoSTAR National Lab to explore creative opportunities that generative AI could support live performance. As our test case, we were initially inspired by live productions of tabletop roleplaying games. These experiences involve collaborative, character-driven storytelling where a lead storyteller creates a world and responds dynamically to unpredictable decisions from other players, who inhabit characters in that world. Use of dice rolls, or other random-chance tools further influence the development of the emergent story in unpredictable ways. Streamed productions of Dungeons and Dragons can reach an incredibly high number of concurrent viewers as well as packing out physical venues with capacities of up to 12,500. This gave us a strong narrative style, made a strong case for audience interaction and allowed us to explore latency with the tools. This creative test case drove the design and development of the production.
We then moved on to mapping user needs (performers, audience, production staff) and understanding technical capabilities. Understanding the roles of the artist and developing tools to allow team members to best moderate and select from the AI imagery was key to this process. We worked closely with an artist to help develop images of a particular artistic style for training our generative AI tools as well as actors who would perform within the show itself.
Alongside this, we needed to set up our capabilities/infrastructure to allow our systems (Disguise via ComfyUI) to talk to one another. We set up the brain bar – our team of engineers and tools that allowed us to drive the production - and facilitated OKO from Magnopus in order to integrate the studio with our online audiences.
What happened next?
We built a nascent early Alpha, proof of concept piece of research infrastructure to support live, in person and online audiences. We ran an iterative process that included regular demos, tests and show and tells seeking input and feedback throughout the development cycle.
We then held a final showcase where we invited critical friends from different screen and performance sectors who could provide us with useful feedback on what had been developed.
These performances featured:
Unreal Engine environments: to create dynamic scene backdrops that changed based on the location of the scene in the game.
On-screen avatars: designed alongside the actors in Flux AI and displayed throughout.
Generative sound with music generated by Suno and Riffusion and SFX from ElevenLabs
Audience interaction: both in-person and via OKO (by Magnopus), a browser-based 3D space that enabled live interaction and social play
What did you learn? Do you have any advice for people working in this space?
Through our tests we got feedback from audiences and industry stakeholders which led to several key insights. Firstly, a key to integrating generative technologies into live performances is to understand how immediacy and speed intersects with the work. Latency – the distance between how long it takes from the creative moment that prompts the AI to the generation and display of the content – is a crucial aspect for creatives to understand. When images were slow to generate, these often meant that the story had progressed and the images became at best irrelevant and, at worst, actively interfered with the work. The nature of this performance was more generous in terms of latency - performers were able to improvise while they waited for the imagery to appear. in a performance style such as dance or music, the medium may be less forgiving. Designers need to be able to work with this constraint in mind.
Secondly, by looking at the world integrity, we explored how creative control for artists is essential in the development process. Artists need to ensure they’re able to control for quality and integrate themselves within new technologies for performance - being able to treat the technologies as a creative collaborator. The human needs to be in the loop when it comes to moderation and centralised decision making amongst an array of generative AI tools.
Finally, by focusing on interactivity, we were able to explore the how audiences and creatives need to be able to see tools and systems as creative collaborators. There are always significant lessons from past practice whenever one is working with emerging technologies. Our industry feedback session included seasoned broadcast TV gallery directors who immediately saw opportunities to draw from the past to think to the future and feedback on how they might be able to utilise our prototypes in their production processes.
What happens next?
We are looking to turn our proof-of-concept infrastructure into a more robust, configurable and modular set of production tools for more users of the National Lab to be able to come and explore and exploit. Being able to plug different AIs and tools into this system will make for a more stable production pipeline and best reflect the creative and technology ambitions of building a converged media production Lab at CoSTAR National Lab. We hope to offer opportunities for creative companies to experiment with this production pipeline, develop creative formats and give feedback on the utility of these tools for their creative ambitions. Watch this space!
Find out more and take a look behind the scenes of a realtime conjuring live performance in this explainer video:
Follow us on LinkedIn and Instagram and sign up to our newsletter to be the first to hear about upcoming news and opportunities from CoSTAR National Lab.
The Prototyping Team
The CoSTAR National Lab prototyping team is a small multidisciplinary team of researchers, producers, creative technologists, and developers from across the CoSTAR National Lab’s core partners: NFTS, University of Surrey, Abertay University and Royal Holloway, University of London.
Lead by Miles Bernie (Co-Head of Innovation), the team for this project were: Helen O’Neill (Senior Training and Innovation Producer), Hazel Dixon (Inclusion and Ethics Research Fellow), Dr Violeta Menéndez González (Senior Research Software Engineer), Johnny Johnson (Senior Creative Technologist), Cristen Caine (R&D Producer), Neil Smith (Senior Technician), Destiny Lawrence (Technician Advanced Production), Branden Faulls (Team Lead Prototyping), Lewis Connolly (Developer), Katie Eggleston (UX Designer), Elliot Hall (Senior Research Engineer), Lorna Batey (Software Developer), Talia Finlayson (Creative Technologist), Cody Updegrave (Producer).