Understanding HRTF Audio: How It Works and Why It Matters
A head-related transfer function (HRTF) is a response that characterizes how an ear receives a sound from a point in space. HRTF is a phenomenon that describes how an ear receives sound from a sound source. When a sound is made, it travels through space in every direction in a sound wave. This wave of sound expands outward from the sound source in every direction, like a rapidly expanding sphere.
Sounds reverberate off of objects near the sound source and the sound waves reach the listener from many different directions and sometimes reach the listener's ear canal directly.When the sound waves reach the listener, they are transformed by the listeners body.
The ears, head, shoulders and even the torso contribute to HRTF. Most notably, the size and mass of the head, the shape of the ear, the length and diameter of the ear canal, and the dimensions of the oral and sinus cavities all manipulate the incoming sound waves by boosting some frequencies and attenuating others.
These changes in the frequency profile of a sound help create a unique perspective and perception for the listener. These changes also help the listener pinpoint the location of the sound source.
In other words: the phase and frequency response of our head. These changes are dictated by the structure of our head: nose, forehead, mouth, hair, bone density, auricles… every feature of us that the sound hits before reaching our eardrums. Depending on where the sound comes from (in front, behind, above, below), is going to encounter different obstacles.
Needless to say that everyone of us has it’s own physical structure, and for this reason the HRTFs are never going to be identical.
HRTF describes how a given sound wave input (parameterized as frequency and source location) is filtered by the diffraction and reflection properties of the head, pinna, and torso, before the sound reaches the transduction machinery of the eardrum and inner ear (see auditory system).
Linear systems analysis defines the transfer function as the complex ratio between the output signal spectrum and the input signal spectrum as a function of frequency. Blauert (1974; cited in Blauert, 1981) initially defined the transfer function as the free-field transfer function (FFTF). Other terms include free-field to eardrum transfer function and the pressure transformation from the free-field to the eardrum.
Generally speaking, the HRTF boosts frequencies from 2-5 kHz with a primary resonance of +17 dB at 2,700 Hz.

The Science Behind Sound Localization
Humans have just two ears, but can locate sounds in three dimensions - in range (distance), in direction above and below (elevation), in front and to the rear, as well as to either side (azimuth). This is possible because the brain, inner ear, and the external ears (pinna) work together to make inferences about location.
Humans estimate the location of a source by taking cues derived from one ear (monaural cues), and by comparing cues received at both ears (difference cues or binaural cues). Among the difference cues are time differences of arrival and intensity differences.
The monaural cues come from the interaction between the sound source and the human anatomy, in which the original source sound is modified before it enters the ear canal for processing by the auditory system. These modifications encode the source location and may be captured via an impulse response which relates the source location and the ear location.
This impulse response is termed the head-related impulse response (HRIR). Convolution of an arbitrary source sound with the HRIR converts the sound to that which would have been heard by the listener if it had been played at the source location, with the listener's ear at the receiver location.
The HRTF can also be described as the modifications to a sound from a direction in free air to the sound as it arrives at the eardrum.
These modifications include the shape of the listener's outer ear, the shape of the listener's head and body, the acoustic characteristics of the space in which the sound is played, and so on.
There are three fundamental ways that humans determine the location of a sound source. The first is an interaural level difference, or ILD. The second localization cue is an interaural timing difference, or ITD. What makes this microphone so unique and powerful is that it adds a third sound localization cue - HRTF. HRTF stands for Head-Related Transfer Function. That name sounds complicated, but a transfer function is just the effect that a component has on the signal.
Interaural level differences and interaural timing differences alone can have ambiguous effects. For instance, imagine a sound that arrives at both ears at the same time and is the same level in each ear. If a sound comes from the left side, it will not only be louder overall in the left ear, but the high frequencies will also be attenuated or reflected before they reach the right ear.
The shape of the pinnae also plays into this, filtering sound differently depending on the angle at which the sound arrives.
How HRTF is Measured and Used
One method used to obtain the HRTF from a given source location is therefore to measure the head-related impulse response (HRIR), h(t), at the ear drum for the impulse Δ(t) placed at the source.
Even when measured for a "dummy head" of idealized geometry, HRTF are complicated functions of frequency and the three spatial variables. For distances greater than 1 m from the head, however, the HRTF can be said to attenuate inversely with range. It is this far field HRTF, H(f, θ, φ), that has most often been measured.
HRTFs are typically measured in an anechoic chamber to minimize the influence of early reflections and reverberation on the measured response. HRTFs are measured at small increments of θ such as 15° or 30° in the horizontal plane, with interpolation used to synthesize HRTFs for arbitrary positions of θ.
In order to maximize the signal-to-noise ratio (SNR) in a measured HRTF, it is important that the impulse being generated be of high volume. In practice, however, it can be difficult to generate impulses at high volumes and, if generated, they can be damaging to human ears, so it is more common for HRTFs to be directly calculated in the frequency domain using a frequency-swept sine wave or by using maximum length sequences.
The head-related transfer function is involved in resolving the cone of confusion, a series of points where interaural time difference (ITD) and interaural level difference (ILD) are identical for sound sources from many locations around the 0 part of the cone.
When a sound is received by the ear it can either go straight down the ear into the ear canal or it can be reflected off the pinnae of the ear, into the ear canal a fraction of a second later. The sound will contain many frequencies, so therefore many copies of this signal will go down the ear all at different times depending on their frequency (according to reflection, diffraction, and their interaction with high and low frequencies and the size of the structures of the ear.)
These copies overlap each other, and during this, certain signals are enhanced (where the phases of the signals match) while other copies are canceled out (where the phases of the signal do not match). If another person's ears were substituted, the individual would not immediately be able to localize sound, as the patterns of enhancement and cancellation would be different from those patterns the person's auditory system is used to.
Assessing the variation through changes between the person's ear, we can limit our perspective with the degrees of freedom of the head and its relation with the spatial domain. Through this, we eliminate the tilt and other co-ordinate parameters that add complexity. For the purpose of calibration we are only concerned with the direction level to our ears, ergo a specific degree of freedom.
Applications of HRTF Technology
HRTF has various applications, enhancing audio experiences in different fields. Some of these include:
- Virtual Reality and Gaming: Creating realistic and immersive sound environments in virtual reality and games.
- Spatial Audio: Improving the quality and accuracy of spatial audio reproduction in headphones and speakers.
- Hearing Aids: Customizing hearing aids to individual ear shapes for better sound localization and clarity.
- Teleconferencing: Enhancing the sense of presence and directionality in teleconferencing systems.
Several microphones have been designed with these principles in mind. One example is the Neumann KU 100 microphone which simulates the average size, density, and shape of a human head. As you can imagine, these are highly specialized microphones and are therefore prohibitively expensive.
They have a few versions of this mic depending on your budget. The FS Pro II has a DPA omni microphone in each ear and XLR outputs, maintaining professional-level recording quality. I use the FS XLR, which also has XLR outputs with slightly less expensive FS microphone capsules. There are also microphone kits that are designed to be inserted into your own ears for recording. However, these have several disadvantages. Firstly, you will need to remain perfectly still and quiet during the recording as any sound or movement will be permanently printed to the recording.
As a mixing engineer, you can utilize binaural panning plugins that will take a mono or stereo audio input and output a binaural rendering. One example of binaural technology on the listener’s end is the binaural rendering process for Dolby Atmos in headphones. In fact, several formats that utilize binaural rendering, including Dolby Atmos, now offer the option to create a custom HRTF rather than a standard algorithm based on a generic HRTF.
Recordings processed via an HRTF, such as in a computer gaming environment (see A3D, EAX, and OpenAL), which approximates the HRTF of the listener, can be heard through stereo headphones or speakers and interpreted as if they comprise sounds coming from all directions, rather than just two points on either side of the head.
Windows 10 and above come with Microsoft Spatial Sound included, the same spatial audio framework used on Xbox One and Hololens 2. On a Windows PC or an Xbox One, the framework can use several different downstream audio processors, including Windows Sonic for Headphones, Dolby Atmos, and DTS Headphone:X, to apply an HRTF.
Apple similarly has Spatial Sound for its devices used with headphones produced by Apple or Beats.
Linux is currently unable to directly process any of the proprietary spatial audio (surround plus dynamic objects) formats. SoundScape Renderer offers directional synthesis.[21] PulseAudio and PipeWire each can provide virtual surround (fixed-location channels) using an HRTF. Recent PipeWire versions are also able to provide dynamic spatial rendering using HRTFs,[22] however integration with applications is still in progress.
The capabilities of this technology are endless and I’m looking forward to seeing how far it will go in the future!