Generating binaural sounds for immersive audio environments
I’ve recently been involved in a project to create an audio game that uses binaural sounds to create a 3D audio environment. My task was to create tools to convert anechoic mono sound samples into binaural samples that appear to be originating from an arbitrary position around the listener’s head.
What’s the difference between binaural sounds compared to stereo? Stereo is created using two sensors and allow us to determine the direction of a sound by measuring the time delay of the incoming signal. A sound to the left will reach the left sensor slightly earlier than the right and so on. But with two sensors measuring time delay we can’t differentiate between sounds on opposite sides of the sensor axis or out of the plane of the sensors. And yet with only two ears we are all able to accurately identify the location of sounds behind or above us. So where is the extra information coming from?
As we have developed from infancy, we have learned how our heads and ears modify sounds from different directions and so can gain extra directional information. The ears are not omnidirectional, they have a strong forward sensitivity and sounds from behind are attenuated. In addition, this attenuation is not uniform across all frequencies and we learn to associate this unique filtering of sounds with the direction from which they originate. This means that the binaural experience is to some extent individual and we’re all tuned to our own particular characteristics. Your auditory pathways are connected differently to mine. This paper describes an experiment (using ferrets of all things) which demonstrates that localisation ability is reduced when listening to sounds modified by a head/ear response that is not the listener’s. By recording sounds using microphones embedded in a dummy head or inserted into the ear canals of real humans recordings can be made that can recreate this effect when listened back to using headphones. However it is not always convenient to record sounds in this way and, as in the case of the binaural game project, we want to synthesise this effect.
In order to generate binaural effects we need to capture the way that the head and ears modify sound. This can be done by measuring the impulse response (IR) at the ears to sounds from different positions around the listener. The impulse response is the response of a system to an infinitely short, infinitely loud signal (infinite spectrum). In reality such signals are difficult to generate so alternative sources are used such as starter pistols. Pistols are good for characterising large spaces such as halls, but in the case where we’re measuring human responses we can use something much gentler such as a time stretched pulse. This is a single low power tone swept over the range of frequencies of interest. Post processing can compress this sweep and determine the IR as if it had been generated by a wideband pulse. What you end up with is a collection of responses known as ‘head related impulse responses’ (HRIRs), or ‘head related transfer functions‘ (HRTFs) in the frequency domain. These can be used to create a binaural sound from a mono sample by convolving the HRIR with the sample. There’s a nice interactive demonstration of this very powerful technique at the Joy of Convolution and a longer discussion of binaural recording here.
Lots of work has already been done in this area, some of which has been very generously shared. A group at IRCAM (Institut de Recherche et Coordination Acoustique/Musique) have made available a number of HRIR measurements from different individuals that can be used to create binaural effects. I’ve written a Java application to apply these HRIRs to mono sounds to create binaural effects. I’ve found that percussive sounds, rich in high frequencies, are most convincing. You can read more about it and download it below. As mentioned, the binaural experience is different for each individual so it’s difficult to predict if a particular HRIR set will work for everyone. You may need to try a number of HRIR sets from the link above before you find one that works for you.
The audio games are still in the development stage, but I’ll post back when there’s anything I can show you.
Download
Run
Either double-click the file BinauralSound.jar or from the command line type
1 |
java -jar "BinauralSound.jar" |
This method is preferable as it’ll let you see any console messages that are generated (usually exceptions due to invalid sound file formats).
Sound test
Check to see if the Java sound system is working correctly by clicking the ‘Sound test’ button. You should hear a 1kHz tone alternating from left to right. On Linux the Java sound system sometimes complains that the host sound device is busy. This can usually be solved by restarting the Linux sound service. For example, if you’re using ALSA then as root type:
1 |
/etc/init.d/alsa force-restart |
Generating binaural sounds
Download a .zip file of HRTF measurements (strictly speaking HRIR measurements) from the LISTEN HRTF DATABASE and load it into the application (no need to unzip it). Load a short (<1Mb) sample to be transformed and click ‘Play’. The sample must be 44.1kHz PCM and 8, 16 or 24 bit. It can be mono or stereo but if it is the latter then only the left channel will be used to generate the binaural output
Here’s a mono anechoic sample of a snare drum to get you started: SnareDrum.wav
The application will sweep the sample from 0° (dead ahead) around your head anticlockwise in 15° steps. You can also dump the generated binaural sounds as WAV files by clicking ‘Generate files…’.
Don’t forget that you’ll need to listen to the sounds with headphones to experience the effect.
Hi,
I was searching for Binaural+Java and stumbled onto your post
http://frisnit.com/2010/02/22/generating-binaural-sounds-for-immersive-audio-environments/
Got lot of info on Impulse Response/HRTF etc. Thanks much.
We are planning to use the IRCAM files+Matlab for some hack. Had a look at the Java based binaural app as well.
Had a couple of questions:
We used a IRCAM sample file (for HRTF) with audio input and generated the binaural audio files. The audio seems to work well when its transitioning (i.e. audio files with different azimuth are played). However if we just use a specific audio (say 045.wav and 135.wav – 2 different audio files) and embed them in HTML (HTML5 audio tag) , we can’t distinguish audio clearly (for e.g right front vs right back)
Tried processing the generated binaural audio with Audacity as well, but in vain.
Is there anything to take care of in terms of post processing these generated binaural files to achieve the desired result?
Or should we generate the audio with different elevation for better distinction? Please let know your thoughts.
Also looks like Binaural is a pretty powerful concept, looks like it hasn’t got much traction (based on google search). Any specific reason it is not adopted? (say in cinemas etc?)
Thanks
Venu
I found the front and back files difficult to distinguish too. I guess the front-back effect is more dependent on the particular listener’s head and ears, while the left-right effect is mainly just down to the relative volume on the left and right headphones. Could be it works better if used alongside visual cues in the case of a 3d game for instance. It’d be interesting to get a set of personal impulse responses made and see if sounds made with them are more convincing.