WHAM: WEBCAM HEAD-TRACKED AMBISONICS
This paper describes the development and implementation of a real-time head tracked auralisation platform using Higher Order Ambisonics (HOA) decoded binaurally based on open-source and freely available web technologies without the need for specialist head tracking hardware. An example implementation of this work can be found at: https://brucewiggins.co.uk/WHAM/. To provide an immersive experience of 3D sound via binaural reproduction using headphones, it is widely acknowledged that the use of head tracking is beneficial to maintain the same auditory cues used in daily life (providing dynamic ITD, ILD and pinna cues). Listeners are shown to experience improved externalisation, source localisation perception within audio scenes and to minimise the number of front-back confusions1,2. The implementation of head tracked binaural representations are vastly simplified using Ambisonics, based on the spherical harmonic decomposition of a sound field at a single point3. YouTube, Facebook and Oculus (for example) have used this combination of formats (Ambisonics decoded binaurally) for 360 video and VR applications due to this flexibility. Although, the system is not without limitation, with spatial aliasing occurring above a set frequency which, for a human head size radius of correct reconstruction can be approximated to ~600Hz for 1st order Ambisonics, 1200Hz for 2nd order, 1800Hz for 3rd order, and so on. Lower orders do, however, offer the benefit of lower channel counts and good computational efficiency which is particularly important for implementation on mobile and lower power devices; the sweet spot of performance and efficiency currently being 3rd order Ambisonics. Various techniques to improve transparency at low orders whilst maintaining this computational efficiency are being actively researched4 with time alignment of binaural filters at high frequencies (>1kHz), diffuse field equalisation and the assumption of a symmetrical head model (which halves the number of convolutions needed)5 being good examples of these optimisations. Current implementations aim for correct and stable identification of source direction and flat frequency response for dry sources as the goal for these optimisations with 3rd order microphones and tools now becoming more readily available in order to both capture and synthesise auditory scenes. However, when auditioning more complex scenes including room response, work from Dring and Wiggins6 has shown that improvements in spatial reproduction are apparent up to a much higher order. For applications such as in the auralisation of real or synthetically simulated rooms where accuracy, rather than efficiency is the priority, the use of higher orders than are possible using currently available tools and equipment is desirable.