Creating Immersive Listening Experiences with Binaural Recording Techniques
Published online: 3 March 2015
- DOI: http://dx.doi.org/10.18177/sym.2015.55.mbi.10863
- PDF: https://www.jstor.org/stable/26574394
Sound recording is a powerful medium that has the ability to capture, preserve, and reproduce a musical performance for future audiences. Like other artistic mediums, sound recordings possess the ability to be challenging and meaningful artworks. However, sound recording differs to some degree, in that it is a highly experimental and partially scientific medium. One of the advantages of recording as an artistic medium is that it can produce entirely new listening experiences that allow listeners to experience music in an entirely different manner and from a different perspective—that is the observation of different details and attributes of the sound material, as they would be experienced from specific listening positions. Binaural recording, a collection of recording techniques that attempts to capture the three-dimensional response of human hearing to a sound source, offers many possibilities for creating exciting new listening experiences. I will discuss the underlying principles of binaural recording techniques and demonstrate their effectiveness in multiple contexts.
Humans are able to estimate a sound’s origin through a process known as sound localization, which relies upon information, or cues, collected by each ear. Humans experience sounds binaurally by comparing the cues received by the left and right ears and have the ability to localize sounds in three dimensions: above and below, in front and back, and to either side. The brain, inner ear and the outer ear work together to make suggestions about a sound’s location by utilizing a combination of inter-aural time differences (ITD), inter-aural intensity differences (IID), and head-related transfer functions (HRTFs)—functions that describe how sound is altered by the head, ears, and torso. The brain’s ability to process and interpret sound cues makes it possible for humans to experience sound three-dimensionally.
Humans detect the left-to-right location of a sound by comparing cues received by the ears—inter-aural time differences (ITD) and inter-aural intensity differences (IID). Time differences occur because of the difference in physical distance that sounds must travel to reach each ear, and intensity differences occur due to differences in the loudness of sounds as they approach the ears. If a sound reaches both ears at the same time, the listener will identify the sound as coming from either directly in front and center or directly behind. When there are differences in the arrival time of a sound source between the two ears, the sound is localized to one side of the listener. Time differences are more effective for locating low frequency sounds, and intensity differences are more effective for locating high frequency sounds. Time and intensity cues are less effective at determining the elevation or front-to-back location of a sound, so human anatomical features and phase must also be considered.
Because of head-related transfer functions (HRTFs) and the acoustical phenomenon known as phase, the shape of the head, the spacing between the ears, and the structure of the outer ear (pinna) alter incoming sounds in a manner dependent upon their direction of origin. Phase may be described as the lead or lag time present between the arrivals of two sounds. Phase differences can cause certain frequencies within a sound source to be reinforced, reduced, or cancelled completely depending upon the lag time between the arrivals of two sounds. This phenomenon, which is known as interference, effectively allows the ear to emphasize or attenuate different frequencies in a process known as spectral filtering. Because of HRTFs and interference, human hearing is direction-dependent and responds to the audible frequency spectrum with peaks and notches at certain frequencies. Every single feature of human anatomy (hair, facial structure, and head thickness) can change phase relationships, alter HRTFs, and as a result, change how a person perceives a sound. Listeners examine the qualities of a sound and determine a sound’s direction based upon an individual understanding of their own body's HRTFs.
In an ordinary stereo recording set up, microphones are spaced and angled in order to capture an appropriate stereo image—the perceived left-to-right and front-to-back spatial locations of a sound source. The signal of each microphone goes to a separate track, and these tracks are panned left and right accordingly. Angling microphones produces level differences between channels, and spacing mics produces time differences; a combination of time differences and intensity differences between the two channels form the stereo image. If a sound is closer to one microphone, it will be louder in that mic and softer in the other; therefore it will sound like it is coming from the appropriate stereo position when played back. Stereo recording techniques capture some directional cues but lack the more specific phase relationships that the ear relies upon for three-dimensional sound localization.
Ordinary stereo reproduction involves the playback of a recording with two fixed sound sources—a spaced pair of loudspeakers. The disadvantage of stereo reproduction lies in its inability to provide sufficient directional cues to allow a listener to pinpoint the precise position of a sound source. Since there is no center speaker in stereo reproduction, a stereo recording reproduced using two loudspeakers relies upon a combination of directional cues and believable illusion to form a perceptual image in the center1. In most cases stereo reproduction from loudspeakers can only achieve a modest impression of three-dimensional spatiality, since stereo reproduction uses only left-right information and is from the front quadrant only2.
Listening to stereo recordings on headphones doesn't provide an accurate result with respect to sound location because the acoustic crosstalk that happens during loudspeaker-based stereo reproduction is removed. Crosstalk refers to the fact that the right channel is heard not only with the right ear during stereo listening, but also with the left ear and vice versa. Because of the absence of acoustic crosstalk during headphone reproduction, listeners are unable to perceive a concrete image in the center of the stereo field, and there is a perceptual “hole in the middle”. The end result is that stereo recordings reproduced over headphones sound as though half of the sound is huddled at the left ear and the other half at the right ear, with nothing in the center.
Binaural recording goes beyond stereo recording by attempting to capture human hearing’s response to sound in terms of phase, directionality, and physical separation. If sounds are recorded in position similar to the ears of a listener and reproduced within the ear of a human listener through headphones, then the listening experience may be preserved, including the placement of sound sources in the environment, the listening perspective, and the acoustic characteristics of the venue.
Binaural playback is based upon the premise that the most accurate reproduction of spatial listening cues will be achieved if the ears of the listener are provided with the same signals that they would have experienced during ordinary listening. In order to experience the three-dimensional sensation of a binaural recording, the left and right playback channels must be reproduced directly in the left and right ears. This guarantees that sound recorded to a specific channel is only reproduced in a certain ear and frees the reproduced sound from crosstalk between the left and right channels and acoustic interaction with the listening environment. Consequently, binaural recordings are best experienced through headphones.
Since the Sony Walkman was first introduced in the 1970s, headphone listening has become commonplace. The widespread acceptance of headphones as a means of reproduction, perpetuated by the popularity of portable listening devices such as iPods, has increased the validity of binaural techniques. Furthermore, from a listener standpoint, headphones have some discernible advantages. The listening environment has less effect on sounds reproduced through headphones; so environmental noise or a poor acoustic space should have significantly less influence on the listening experience. Headphones can also be a relatively inexpensive way to listen to music, when compared to loudspeakers. In any case, binaural recordings do not depend upon the highest fidelity playback system; however, better playback equipment will inevitably deliver improved sound quality.
Binaural Reproduction with Loudspeakers
Though it is desirable for binaurally recorded material to be reproducible directly with loudspeakers, playback of a binaural recording through ordinary loudspeakers is problematic and will not deliver the intended three-dimensional sensation of the recording. If ordinary loudspeakers are used, crosstalk and acoustic interaction with the listening environment will disrupt the binaural effect. Crosstalk destroys much of the realism of binaural recordings and is currently being addressed by multiple processes that attempt to cancel out the additional signal paths in either the original recording or in the listening environment. Though a binaural recording experienced through ordinary loudspeakers will not deliver the intended three-dimensional sensation, it still may produce interesting dimensional effects.
Binaural Recording Techniques
All binaural recording techniques, to some extent, attempt to preserve the head-related-transfer functions (HRTFs) created by human anatomy. Binaural techniques fall under three categories: baffled techniques, head worn microphones, and “dummy head” techniques. In each of these techniques, it is the spacing between mics that creates time differences; however, an additional device functions as an acoustic baffle, providing supplementary cues comparable to human anatomy.
“Dummy Head” Recording Techniques
“Dummy head” recording techniques utilize a specially constructed model of a human head outfitted with omnidirectional microphones—microphones that pick up sound equally well from all directions. “Dummy head” microphone arrays are modeled upon human anatomical properties: the shape of an average human head, nose, pinnae, and ear canals. The shoulders and torso are also considered important by some, since they can contribute to the head-related-transfer functions (HRTFs). See Example 1.
Baffled Microphone Techniques
Baffled microphone techniques utilize a sound-absorbent acoustic baffle placed between two omnidirectional microphones. Perhaps the most common baffled technique is the Jecklin disk. The Jecklin disk utilizes two omnidirectional microphones separated by a circular baffle that is 12 inches in diameter and covered by acoustic foam. See Example 2. Baffled microphone techniques simulate the shadowing effect of the head but do not replicate the filtering effects of the outer ear. As a result, recordings made using baffled microphone techniques may have better loudspeaker compatibility than other binaural approaches.
Head Worn Microphones
Binaural recordings may be made with small microphones worn in the ear canals of a human listener. Head movements, which help resolve directional confusion for human listeners, are not captured by the previously discussed binaural techniques. In a real life listening situation, head movements by the listener will result in a different perspective relative to the individual sound sources. As a result, head movements can be used to resolve perceptual confusion and to improve localization.
The potential applications of binaural recording techniques are wide ranging within the recording arts. I have produced several binaural recordings using a variety of different recording techniques in many musical genres. The recorded examples that I have included were recorded under vastly different conditions and from different listener perspectives. In any case, these examples should be monitored through headphones in order to accurately perceive the binaural sensation.
In the first example, a performance of Richard Wagner’s Ride of the Valkyries, a Jecklin disk was utilized along with a matched pair of Earthworks QTC-30 microphones. The microphone array was placed 20 feet back from the orchestra at the same height as the ears of the conductor. Example #3 demonstrates the result of this technique.
Example 3, Richard Wagner’s Ride of the Valkyries
Contemporary Music and Environmental Recordings
Example #4, was recorded in an outdoor setting. Performers were scattered throughout the performance space—an amphitheater within a valley area. A Jecklin disk was utilized along with a matched pair of Earthworks QTC-30 microphones. In this recording, environmental sounds can also be heard, and these were a welcome component of this performance.
Example 4, structured contemporary music improvisation
Binaural recordings have been largely absent from popular music; however, binaural techniques could provide a recording engineer and aspiring artist with a method of setting themselves apart from the vast assortment of recorded music. Utilizing binaural recording techniques in popular music could recreate the intimacy of a live performance and could offer a departure from the tradition of recording in a dead space and superimposing artificial reverb. Binaural techniques could also add a unique dimensional effect that could create additional interest in a project.
Example 5, demonstrates the incorporation of binaural recording techniques in the context of a multi-track popular music recording. This recording combined binaural techniques, with conventional recording techniques: direct insertion for the keyboards and electric bass, close-miking individual drums, and the use of virtual instruments for organ and orchestral sounds. A Jecklin disk equipped with a matched pair of Lewitt LCT 340 microphones was used to record the drums and guitar, and a specially constructed “dummy head” was used in conjunction with a matched pair of Earthworks QTC-30 microphones to record the vocal tracks. Vocal tracks were recorded with the vocalists standing in different positions around the dummy head; furthermore, this added an intimate, multi-dimensional perspective to the project. The binaural sensation is noticeable when monitoring through headphones; however, the recording also sounds quite spacious when monitored through traditional loudspeakers.
Example #5, popular music ballad
Comparison with a Stereo Example
Example #6, provides a strong contrast to the previous binaural examples and was recorded in the same concert hall as example #3 (Wagner): however, it was recorded with a spaced pair of Earthworks QTC-30 microphones. In both examples 3 and 6, the natural ambience of the hall was captured, but the audible difference lies in the reproduced sound—the spaciousness of the recording. The sounds of the Rossini example seem to be nestled at the left and right ear, with nothing in the center. The individual phase relationships required to generate a binaural sensation are absent. In this example, the acoustic crosstalk that normally takes place in loudspeaker-based stereo reproduction is removed, so listeners are unable to perceive a concrete phantom image in the center of the stereo field. Example #6 demonstrates these observations.
Example #6, Gioachino Rossini’s Introduction, Theme and Variations for Clarinet
Binaural recording techniques provide recording engineers and sonic artists with methods to create an ever more immersive soundscape; furthermore, binaural recording techniques are equally viable in the concert hall, recording studio, or environment. The value in binaural recording lies in enabling curious listeners to experience sound in a way that would otherwise be impossible—with a supreme degree of spaciousness. Nevertheless, the final success of binaural recording techniques relies upon listener preferences, audience acceptance and reaction to binaural recordings, which have never received a thorough assessment.
1. Rumsey and McCormick, Sound and Recording, 482.
Alten, Stanley R. Audio in Media. 10th ed. Boston: Cengage Learning, 2013.
Ballou, Glen, ed. Electroacoustic Devices: Microphones and Loudspeakers. Oxford: Focal Press, 2009.
Borwick, John. Loudspeaker and Headphone Handbook. 3rd ed. Oxford: Focal Press, 2001.
Eargle, John. The Microphone Book. 2nd ed. Oxford: Focal Press, 2004.
Fontana, Simone, Angelo Farinal, and Yves Grenier. “Binaural for Popular Music: A Case of Study.” Proceedings of the 13th International Conference on Auditory Display, (June 26-29, 1992): 85-90.
Howard, David M. and Jamie Angus. Acoustics and Psychoacoustics. 4th ed. Oxford: Focal Press, 2009.
Kahrs, Mark and Karlheinz Brandenberg, eds. Applications of Digital Signal Processing to Audio and Acoustics. Boston: Kluwer Academic Publishers, 1998.
Møller, Henrik. "Fundamentals of Binaural Technology." Applied Acoustics 36, (1992): 171-218.
Rumsey, Francis and Tim McCormick. Sound and Recording. 6th ed. Oxford: Focal Press, 2009.
Last modified on Thursday, 07/03/2019
Shane Hoose is active as a recording engineer, composer, and percussionist. He holds degrees in music from the University of Iowa (Ph.D.), Bowling Green State University (MM) and Ball State University (BM). As an engineer he has recorded all styles of music. Most recently, he has recorded multiple album projects utilizing analog recording technology, including The Westbrook Trio’s Postmodern Man and Idylwild’s Faces. Hoose is an active clinician in the area of music technology and has recently given presentations at conferences of the Technology Institute for Music Educators, College Music Society, and the Music and Moving and the Moving Image Conference at New York University. Dr. Hoose serves as Assistant Professor of Music Industry/Recording Arts at Eastern Kentucky University.