Stereo Rendering Environments Project Proposal
Introduction
For my major project I am proposing an inquiry into audio encoding technologies which render spatial and 360° directional stereo playback. I will be utilising directional encoding software on instrumental audio sources to define spatial information within a rendered playback environment relayed via a stereo field. The focus is anchored on the application of this across stereo playback music productions, and how effective the application may be with regard to the spatial capacity enhancement and practical application of these directional rendering technologies to a mix’s stereo field.
The inquiry will be covering existing encoding methods such as binaural, HRTFs, ambisonics within 3D virtual rendering environments such as Dolby Atmos, Apple’s Spatial Audio, and Steam Audio. I aim to investigate the application of encoders, practicality of their utilisation and the effect of these technologies in a mix, through the development of a framework for implementation. The goal of this project is to create a portfolio of stereo music productions, with creative use applied to tracks, including render samples produced alongside carrying out my own developing implementation process of 3D audio rendering.
The practical crux is that surround formats, such as 5.1 / 7.1 / Dolby Atmos, typically require either large and costly setups for playback or specialised equipment for recording. Therefore the inquiry is aimed at the middleground mixing process (post-production) where spatial information can be defined regardless of the provided input material’s format (mono / stereo / surround input) or playback medium across stereo or surround fields. The creative crux is that within audio production the limitations of directionality within the typical LR stereo field may now be able to be redefined with spatialization rendering technology, with the aim of creative application to mixing stereo music with distinct directional and spatial properties typically reserved for surround formats.
The project’s inquiry will include practice-based experimentation and methodology, with action research undertaken to analyse and refine the rendering process. It will also fit an interpretivist paradigm, due to the psychoacoustic nature of the subject context and my own intrinsic impact as the researcher. Creative influence will be a part of the practical application, but this will be done distinctly separate from the preliminary inquiry into the spatial application and rendering effect of the various encoders.
My project will be submitted towards a BA classification because the success of the practical and creative application of the developed rendering process through the creation and output of music productions, will depend on a subjective reflection and interpretivist approach and outcome despite the objective analysis required during rendering development. This project will have a 60 / 40 weighting, between written work & research / practical work.
Rationale
The reasoning behind this project spans across both creative and practical application to my own work as a music producer and mixing engineer. Practically it will develop my own understanding of the emerging area of spatialisation technologies within the audio production industry. This will benefit my own potential career path in the future as I further my understanding of technologies that are gaining wider industry application, which are beginning to incorporate 3D representations of previously 2D forms. Evidence of this integration amongst emerging technologies with significant industry and consumer interest include VR applications, such as Apple’s 3D Vision Pro, which elevates existing 2D representations of visual elements into a 3D space. Similarly in audio there is the establishment of surround sound across specialised playback mediums such as Dolby Atmos suites specifically within the audio industry for production or playback. Along with other periphery hardware development such as Playstation’s Pulse 3D gaming headset for surround playback via specialised over ear headphones specifically for gaming, which implements Tempest 3D AudioTech engine encoding.
Reasoning behind the stereo approach is that these emerging commercial developments and consumer adoptions of the surround playback medium are typically costly and inaccessible to the regular consumer. Therefore they have not seen wide utilisation in the area of mainstream audio productions despite the fascinating potential of redefining the 2D LR stereo field into a 3D rendered environment. To have the opportunity to inquire into this emerging field by focusing on bringing spatial sonics and directional qualities specifically, into the already established standard of stereo formatting could make for an interesting counter to the existing norms of surround sound production and playback. Along with this, the trend across AV playback mediums of emerging 3D rendering processes incorporated into existing technology raises the question of when, and how, will this be incorporated into music production for widespread and easily accessible adoption if this trend continues.
Regarding my own trajectory of skill development and professional specialisation, this subject area fits very well with where I have come to throughout this previous and current learning / experience, and aligns with where I aim to progress into a future career in audio production / technology. Along with my 4 years experience in audio production / mixing and current studies into the field, I have past higher education experience studying computer science which brings experience of developmental action-research processes such as programming development and debugging.
From my own personal and creative perspective and approach to music production, as an artist who delves into the mixing process as if it were a canvas to be layered and embellished with sound. I see that there is potential here to incorporate these encoding methods to create mixes with enhanced qualities of space, depth, dynamics and spatial sonics on the canvas of an LR stereo field. This is paramount, as it can still be reproduced via the existing stereo format medium. A direct inspiration on my own creativity within this project has been the work of Claude Debussy, his integration and awareness of space in compositional work was instrumental in the development of his work and integral to the style of his creativity within his art. The above quote is the direct inspiration except with a specific reinterpretation, as throughout the creative output of this project I aim to enhance the musical space within rendered dimension, rather than time within composition.
“Music is the space between the notes” - Claude Debussy
Methodology
For the project’s practice-based inquiry as the participatory practitioner I will be within a subjective & interpretivist paradigm as I must actively acknowledge my own impact on the research within the project.
The practice-based inquiry itself includes the action research development and practical application of a rendering method through experimentation with various encoder technologies available, which will be analysed myself with critical listening. The significance of this to the project is that to develop a successful rendering process for musical mix applications I must first rigorously analyse through critically listening to the rendering encoder's output to distinguish their inherent capabilities, to reach an application of encoding which is both practically worthwhile and creatively effective. This will be assessed against existing software plugin patch integration, ease of use, and versatility of effect.
Being subjective in nature, the effect of audio spatialisation is psychoacoustic and therefore is dependent on parameters defined by the listener’s auditory system which are beyond the control which would be required for an firmly objective analysis of the rendering effect and perception of it.
Due to the relative perceptual interpretation of audio and being an audio engineer myself, my own interpretation may differ to the average user in terms of practicality and affect, however I aim to mitigate this by being as objective as possible in my critical listening with regard to the rendering development. To achieve this I will also inquire into critical listening strategies, as the research design to mitigate subjectivity will use critical listening theory for observance and analysis. Throughout the undertaking of the methodology, I will draw the line between the subjective aspects of my analysis from any objective observations which will be undertaken during the rendering development process.
Development of the rendering process will be carried out with action research, with the cyclical framework of first producing effect sample demos through:
- Development - Action
- Analysis
- Conclusion
This will allow for the development and refinement of the rendering process, which will be analysed objectively with acknowledgement of my own subjectivity.
Analysis undertaken will be qualitative, with assessment written focussing on the distinct directionality, spatialisation and quality of the proposed rendering encoders with the aim being to find the most effective processing setup of the technology. From this I will then draw conclusions based on a quantitative and qualitative analysis of the quality of demos and the success of the rendering development’s implementation into stereo audio, with observance action of playback which will then inform further development.
The action research method which I will be utilising for the practical development process has been chosen due to the refinement and reflective approach to reviewing the output of rendering & encoding process experimentation for continual development. This process holds similarity to debugging, an efficient method of development based in the action-research approach. This is relevant to this field of audio processing development as it is software based, therefore the action research process brings relevance to the project within the field of audio technology as the conclusions reached could benefit further research into development within the field, as well as for my own creative output.
The creative output of this project will consist of a portfolio of stereo music productions including musical compositions besides the demo productions, which implement the rendering processes which I will investigate. This will be produced to highlight the potential of what these encoding methods can bring to the field of audio production and specifically music mixing. The productions will be musical in nature and with an instrumental focus with ambience embellished with spatial detail and depth. The purpose of this creative output will be to demonstrate the spatial audio field capabilities of stereo formatting, which I will be investigating throughout the action research and experimentation.
Annotated Bibliography
Blue Ripple Sound (N.D.), “HOA Technical Notes - Introduction To Higher Order Ambisonics”, Online : Available at : https://www.blueripplesound.com/notes/hoa
[Accessed : 01/06/23]
In this article, Blue Ripple Sound provides an overview on the technical notes regarding higher order ambisonics and the processing involved with creating 3D HOA soundfields and ambisonic formats, along with how they differ from other surround sound formats. There is particular relevance within this, as it refers to the specific processing of soundfield generation for stereo playback; “first order” formats including stereo-compatible UHJ (Universal Matrix System 45J) format (C-Format). The depth of this reference does not delve extensively into information regarding the subject, therefore further referencing will be required. It also covers a brief technical description of HOA processing stages, including encoding and decoding. As well as an explanation of the differential between Ambisonics and HOA.
D.N. Zotkin et al, (2004), “Rendering localised spatial audio in a virtual auditory space”, IEEE,
Online : Available at : https://ieeexplore.ieee.org/abstract/document/1315647/references [Accessed : 01/06/23]
Within this academic paper, the specific focus of relevancy is the section on “Audio Scene Rendering Algorithms”. This section of the paper covers rendering of accurate audio environments and delves into the detail of how this can be achieved utilising HRTFs and environmental sound source cues. This bears significance to the development process of my methodology, as well as by shedding light on the underlying mechanics of what the encoders do which I’ll be implementing to render the spatialized audio. A key principle of spatial audio which is referenced here is “externalisation”, which will be one of the aspects of quality which I will analyse in the action research development.
Apple Books (N.D.) “Overview of B-format surround encoding in Impulse Response Utility”, Online : Available at : https://support.apple.com/en-ie/guide/logicpro-iru/dev022fbc493/mac
[Accessed : 16/06/23]
This article contains an overview of the Ambisonic B-Format encoding utilised in Apple’s Impulse Response DAW plugin “Space Designer”. This has relevance to my investigation into the existing spatial encoding technologies which are available and accessible already within the area of audio production. The specific piece of significance within this resource is the visual representation of how a 3D spatial image sound field is virtually rendered to incorporate height and depth, as well as to illustrate the implementation of sound pressure in the processing.
Lakulish, Valve Corporation (N.D.) “Steam Audio Unity User Guide”, Online : Available at : https://github.com/ValveSoftware/steam-audio/blob/master/unity/doc/guide.rst
[Accessed : 16/06/23]
The above page is a primary resource from the developers of one of the leading pieces of software in audio spatialization, typically integrated into video game development. This user guide is the specific focus of the wider reference to the github repository which reveals the underlying functionalities present in the software’s capabilities. As a free piece of accessible software, it holds relevance not only in its thorough scope covering the adaptable parameters of its spatialisation functions and features, but also stands as an example of accessibility within the field.
Näf, M. et al (2002) “Spatialized Audio Rendering for Immersive Virtual Environments”, Online : Available at : https://escholarship.org/content/qt41r178s4/qt41r178s4.pdf?t=ptt2z1 [Accessed 20/06/23]
This paper is relevant to the field of spatial audio rendering as it presents an audio rendering system for virtual environments. Although the date of the paper makes it less up to date, it lacks recently established conventions and provides an informative breakdown of key terminology which remains relevant to the area of study, therefore still a valuable resource.
Carl Schissler et al (2016) “Efficient HRTF-based Spatial Audio for Area and Volumetric Sources”, IEEE, Online : Available at : https://ieeexplore.ieee.org/abstract/document/7383327 [Accessed 20/06/23]
This source provides a good example case of similar developmental approach to the construction of spatial audio rendering and encoding methods. With in-depth information regarding HRTF implementation the information gained from this source will be directly relevant to this area of my own project. Although a slight bias skew is present in the abstract, when using this resource I will be aware of this and work to mitigate its influence on my own work.
Jakob H et al (2004) “Managing Risk in Software Process Improvement: An Action Research Approach”, Mis Quarterly, Online : Available at : https://www.jstor.org/stable/25148645
[Accessed 02/07/23]
The above journal article demonstrates risks associated with software process improvement which restricts development and implementation of new processes. With this I will be aware of areas to pay attention to with my own implementation of audio rendering process, to mitigate possible issues or risk. This will be valuable in the project’s development and implementation areas, as it is based in practice and action research.
University of York, “Spatial audio and virtual acoustics”, School of Physics, Engineering and Technology, Online : Available at : https://www.york.ac.uk/physics-engineering-technology/research/communication-technologies/audio-and-acoustics/spatial-audio-virtual-acoustics/ [Accessed 20/07/23]
This article shows an example of implementation of the spatial audio encoding processes and virtual acoustic environments which are related to this project I am undertaking. Though the specific encoding used in the projects are ambiguous, regarded in the Room Acoustics Modelling as being resource intensive, the significance of this resource is in it’s demonstration of both academic interest in the field of virtual acoustic environments and an implementation of encoding models. This will be a valuable area of research, as furthering my knowledge in virtual acoustics will better my understanding of the processes involved in rendering custom 3D rendering environments.
-
Resources
Clément Gaultier, et al (2017), “VAST : The Virtual Acoustic Space Traveler Dataset”, Online : Available at:
https://hal.science/hal-01416508
[Accessed : 26/07/23]
Simon Fraser University (2020), “Tutorial For The Handbook For Acoustic Ecology”, Online : Available at:https://www.sfu.ca/sonic-studio-webdav/cmns/Handbook%20Tutorial/AcousticSpace.html [Accessed : 26/07/23]
Tapio Lokki (2008), “Handbook of Signal Processing in Acoustics - Virtual Acoustics”, Online : Available at :
https://link.springer.com/chapter/10.1007/978-0-387-30441-0_39
[Accessed : 26/07/23]
Brereton, J (2017), “Music perception and performance in virtual acoustic spaces.”, Online : Available at:
https://psycnet.apa.org/record/2017-21406-012
[Accessed : 26/07/23]
U.P. Svensson (2002), “Modeling acoustic spaces for audio virtual reality”, Online : Available at:https://www.researchgate.net/publication/215514469_Modelling_acoustic_spaces_for_audio_virtual_reality [Accessed : 26/07/23]
University of York (2023), “Spatial audio and virtual acoustics”, Online : Available at:https://www.york.ac.uk/physics-engineering-technology/research/communication-technologies/audio-and-acoustics/spatial-audio-virtual-acoustics/
[Accessed : 26/07/23]
Max Cooper (2020), “Resynthesis (3D Binaural Audio) by Max Cooper and Kevin McGloughlin [Headphones Only]”, Online : Available at: https://youtu.be/Xkiyan7fBvk
[Accessed : 26/07/23]
Samia Ibtasam (2015), “Beyond Access: Broadening technological and financial inclusion”, Online : Available at: https://digital.lib.washington.edu/researchworks/handle/1773/48891
[Accessed : 26/07/23]
City University of Hong Kong (2023), “Virtual Acoustic Space”, Online : Available at:https://auditoryneuroscience.com/spatial-hearing/virtual-acoustic-space
[Accessed : 26/07/23]
City University of Hong Kong (2023), “Spatial Hearing”, Online : Available at:https://auditoryneuroscience.com/spatial-hearing
[Accessed : 26/07/23]
J. L. Gonzalez-Mora (2006), “Seeing the world by hearing: Virtual Acoustic Space (VAS) a new space perception system for blind people.”, Online : Available at: https://ieeexplore.ieee.org/document/1684482
[Accessed : 26/07/23]
Josh Luis Gonzalez Mora (2002), “Virtual Acoustic Space Research”, Online : Available at:http://research.iac.es/proyecto/eavi/investigacion.html
[Accessed : 26/07/23]
Stanford University (2023), “Immersive virtual acoustic spaces”, Online : Available at:https://otl.stanford.edu/researchers/high-impact-technology-hit-fund/Immersive-virtual-acoustic-spaces
[Accessed : 26/07/23]
Apple Inc (2023), “Overview of B-format surround encoding in Impulse Response Utility”, Online : Available at:
https://support.apple.com/en-ie/guide/logicpro-iru/dev022fbc493/mac
[Accessed : 26/07/23]
Katsuhiro Chiba (2016), “Synthetic HRTF 3D Audio Test 2 (for Headphones)”, Online : Available at:
https://youtu.be/QhzgQ2j0miI
[Accessed : 26/07/23]
Spatial Hearing Lab (2022), “Demonstration of 3D audio rendering using individualized Head-Related Transfer Function (HRTF)”, Online : Available at: https://youtu.be/ZTDOhZDkek4
[Accessed : 26/07/23]