Ap Cam

Find The Best Tech Web Designs & Digital Insights

Technology and Design

Ambisonics Technology Explained: A Deep Dive into Immersive Audio

Ambisonics is a full-sphere surround sound format or a means of representing the sound field at a point or in space. Unlike conventional stereo and surround sound formats (which are based on the principle of panning audio signals to specific speakers), ambisonics captures the full directivity information for every soundwave that hits the microphone.

Ambisonics is a concept for spatial audio capture, storage, and reproduction that is fundamentally different from any of the other concepts that exist, which may be the reason for why it resides in its own microcosm. One can think of ambisonics as a concept that is built around a specific type of multichannel audio signal called an ambisonic signal, and it comprises the entire signal pipeline from capture to reproduction.

Ambisonics signals are multichannel signals. A very fundamental property of an ambisonic signal is its ambisonic order, or, simply its order. The order determines the accuracy with which the encoded sound field is represented. A higher order means a more accurate representation.

Let us disregard for the time being where our ambisonic signal of interest might have originated from and first look at how it can be reproduced. Ambisonic signals can be reproduced through loudspeaker arrays as well as binaurally whereby loudspeaker arrays are the original way.

The term ambisonics is derived from Latin and may be translated to ‘surround sound’. It was pioneered in the 1970s, but it did not experience commercial success then. A small number of enthusiasts kept the ball rolling over the decades until the ready availability of multichannel audio hardware and of sufficient digital processing power made the academic research community pick it up again around the early 2000s from which point on the concept matured to the extent that it was finally ready to trickle from academia into practice. Software tools have been available freely and open source for years, and the significance of ambisonics in the industry keeps growing.

Ambisonics as a scientific theory and practice was developed by Michael Gerzon, Peter Felgett and Geoffrey Barton in the early 1970s at the University of Oxford and the University of Surrey. The first SoundField Microphone was invented by Gerzon and Professor Peter Craven in 1975 and developed for commercial release in 1978. Ambisonics was way ahead of its time and remained a high-end, expensive niche technology until very recently.

Ambisonics Audio Explained

Understanding the Basics of Ambisonics

The channels of the ambisonic signal are related to each other through an advanced mathematical framework that describes the physical structure of a sound field. One speaks of the sound field or sound scene being encoded into ambisonics. The underlying mathematical framework is indeed rather involved. It bases on functions that are termed spherical harmonics or spherical surface harmonics. These spherical harmonics are used extensively in a variety of fields of physics including in quantum mechanics. The good news is that it is absolutely not necessary to understand the underlying mathematics to be able to work with ambisonics. We therefore omit diving deep into it here.

From a practical point of view, an ambisonic signal is a multichannel audio signal. But because the set of channels constitute a comprehensive representation of an entire audio scene, we will sometimes use the terms ‘ambisonic representation’ or ‘ambisonic content’ or similar to refer to it. One important aspect to understand is the circumstance that an ambisonic representation relates to a given point in space, which is the vantage point from which the sound scene is observed. Imagine someone playing a violin in a room. One can place a suitable arrangement of microphones (a so-called microphone array) into the room and obtain an ambisonic representation of the physical structure of the sound field around that microphone array. Doesn’t this sound like a very powerful framework?

What is a little unfortunate is that the ambisonic representation of the sound field is rather abstract so that one cannot directly read from the signals things like how many sound sources there are in the scene, where they are located, how strong the reverberation is and the like (This is, by the way, a very difficult endeavor with any signal representation.). The powerful aspect of ambisonics is that the sound field that an ambisonic signal represents can be physically re-created (even though we do not know in tangible terms what exactly the sound field contains). In other words, one can drive an array of loudspeakers such that the sound waves that the individual loudspeakers emit superpose and make up the original captured sound field.

It is even possible to render the ambisonic signal binaurally, which is equivalent to virtually placing a human head in the sound field and compute the signals that would arise at the ears of the person if they would have been listening to the original scene at the location of the microphone arrangement that captured the ambisonic signal. We will explain this in more detail in Sec. Summarizing the above and attempting a definition of ambisonics, one could say that ambisonics is the combination of a specific multichannel representation of a spatial audio scene, which is based on spherical harmonics, and the corresponding capture and reproduction technologies. Fig. 1 summarizes this.

Ambisonics signal chain

Figure 1: Two flow charts of the complete signal chain in ambisonics. The right chart includes an optional stage for editing of the sound scene. The editing can relate to both spatial and non-spatial information. Encoding is a mathematical process that converts microphone array signals or virtual sound scenes that are computer generated into an ambisonic signal. Decoding is a mathematical process that converts an ambisonic signal to loudspeaker or headphone signals.

As stated above, ambisonic signals are multichannel signals. A very fundamental property of an ambisonic signal is its ambisonic order, or, simply its order. The order determines the accuracy with which the encoded sound field is represented. A higher order means a more accurate representation. As a ballpark, one could say for now that any order above 5 may be considered high, and any order below that may be considered low. As you might have guessed, a higher-order signal has more channels than a lower order signal. If NN is the order, then the ambisonic signal has (N+1)2(N+1)^{2} channels (a 0th-order signal has one channel, a 1st-order signal has 4 channels, and a 7th-order signal has 64 channels). Usually, the channels in an ambisonic signal are ordered according to the ambisonic channel number (ACN) in which the first channel comprises the 0th-order representation of the signal, the first 4 channels are the 1st-order representation and so forth. Fig.

Ambisonics B-format components

Visual representation of the Ambisonic B-format components up to third order. Dark portions represent regions where the polarity is inverted.

One can think of the first channel of the ambisonic signal to contain the signal that an omnidirectional microphone would capture when being positioned at the vantage point from which the sound scene is observed. Channels 2, 3, and 4 contain signals that are similar to the signals that figure-of-eight microphones would capture at the vantage point when being oriented along the three coordinate axes, respectively. Channels 5 and higher contain signals as they would be captured by microphones with more complicated and detailed directivities. Refer to Fig. 2 for an illustration of what directivity a single microphone would need to have to capture a signal similar to what is contained in a given channel of an ambisonic signal.

We deliberately use the term “similar” here because there are some delicate differences between what such notional microphones would capture and what the channels of an ambisonic signal contain. The differences are not of relevance in this context so that we omit going into further detail here. Actual ambisonic signals are not obtained from literal microphones with according directivities but from microphone arrays in which the signals from the different microphones are combined using signal processing to synthesize the required directivity, or they are computer generated. We will cover this in Sec. There is one detail of the mathematical framework of spherical harmonics that we would like to introduce here: A given order of an ambisonic representation usually comprises several so-called modes (only the 0th order comprises a single mode). We usually use the symbol nn to refer to the order that we are looking at and the symbol mm to differentiate the different modes. Recall Fig. 2. A pair of nn and mm then denotes a mode uniquely. mm always goes from −n…n-n\penalty 10000\ \dots\penalty 10000\ n. Tab. 1 provides audio examples for the different channels of an example ambisonic signal that contains a scene composed of a saxophone, a guitar, and a double bass. The saxophone is located straight ahead of the vantage point, the guitar at 45 °45\text{\,}\mathrm{\SIUnitSymbolDegree} to the left, and the double bass at 45° to the right as illustrated in Fig.

Sound scene encoded

Figure 3: Sound scene encoded in the ambisonic signal from Tab. 1. The zz-axis points perpendicularly out of the plotting plane.

Table 1: Audio examples2,2 of the individual channels of an ambisonic signal (saxophone + guitar + double bass, see Fig. 3). The channels are sorted according to the ACN scheme. Recall from Fig. 2 that the mode (n,m)=(0,0)(n,m)=(0,0) comprises the equivalent of an omnidirectional signal. You will hear all instruments equally loud in that channel. You might be noticing that the saxophone sounds a quieter and more reverberant in the mode (n,m)=(1,−1)(n,m)=(1,-1). This is because the direct sound of the saxophone happens to impinge from a direction in which the ‘directivity’ of the mode has a null. As a consequence, the direct sound is suppressed (while the reverberation is not as it impinges from different directions). This is similar for the mode (n,m)=(1,0)(n,m)=(1,0) in which the direct sound is suppressed for all three sound sources as the horizontal plane lies entirely in the null of the directivity. The saxophone is louder in the mode (n,m)=(1,1)(n,m)=(1,1) because that mode has a lobe that points directly at the saxophone. The contents of most of the other channels are very difficult to interpret when listening to them. This holds especially true for all modes n>1n>1. We conclude that in some situations, it may be possible for some of the channels of an ambisonic signal to make sense of what one hears when listening to a given channel in isolation. But this will be very difficult in the general case. Ambisonic signals unfold their magic when the different channels are combined by means of mathematical operations.

Decoding and Rendering Ambisonic Signals

Decoding ambisonic signals for a loudspeaker array means computing the signals with which a given set of loudspeakers need to be driven to make a listener experience the sound scene that the ambisonic signal represents. Recall Fig. 2, which displays the directivity that a notional microphone needs to have to capture the signal that is in a given channel of an ambisonic signal. If we add the channels of an ambisonic signal that correspond to the modes (0,0)(0,0) and (0,−1)(0,-1), i.e., we add channels 1 and 2 if ACN is used, we obtain a new signal that represents the signal that would be captured by a notional microphone with the directivity that is depicted in Fig. 4 (left): A cardioid pointing into positive yy-direction (a slightly higher weight was put on the figure-of-eight directivity compared to the omnidirectional one in this figure). If all channels from Fig. 2 are added with a specific relative weighting (gain), then a new signal is obtained that represents the signal that is captured by a notional microphone with the directivity that is depicted in Fig. 4 (right): A highly directional microphone with a main lobe pointing into positive yy-direction.

In other words, the microphone in Fig. 4 (right) captures mostly only those components of the sound field that impinge from the direction into which the main lobe points. If we position a loudspeaker in that very direction and drive it with that very signal, the loudspeaker reproduces the corresponding sound field components with the correct propagation direction. If this is done similarly for a suitable set of directions all around the listener, then the sound scene that is represented by the ambisonic signal is reproduced as a whole. It is being decoded. Note that directivity lobes like the ones from Fig. Decoding works best if the loudspeakers are arranged on a sphere. Loudspeaker arrangements that deviate from spherical (such as ellipsoids and the like) can be used, too, in principle, whereby there is no standard solution for computing the loudspeaker signals in these cases, and the result will vary depending on the chosen decoding method. Horizontal-only arrangements such as circles of loudspeakers are possible, too. Refer to Fig.

As a general rule, the loudspeaker arrays need to enclose the listening area, and the more they deviate from a spherical or circular shape, the more limitations arise with respect to the reproduction accuracy. Reproduction of an NNth-order ambisonic signal requires technically at least (N+1)2(N+1)^{2} loudspeakers to fully render all information in the signal. Some of the recent decoders like AllRAD can work with fewer loudspeakers and with loudspeaker setup that are not homogeneous (for example, if there are no loudspeakers below the horizontal plane).

Given that an ambisonic signal is a representation of the physical structure of the captured sound field, it is theoretically possible to derive a mathematical formulation for the signals that a given loudspeaker array needs to be driven with so that the loudspeaker array physically reconstructs the encoded sound field. Such decoding could be considered the ideal reproduction of the encoded sound field and is illustrated in Fig. 6 (middle).

Ambisonic loudspeaker arrays

Figure 5: Photographs of ambisonic loudspeaker arrays. Left: Room Wilska at Aalto University (45 loudspeakers on a sphere). Right: IEM Cube (25 loudspeakers on a dome plus 5 subwoofers).

The Advantages of Ambisonics

Ambisonics offers several compelling advantages:

  • Immersive Sound: Creates a truly enveloping and realistic sound field.
  • Flexibility: Can be decoded for various speaker setups, from headphones to complex multi-speaker arrays.
  • Scalability: Can be scaled to any desired spatial resolution by adding more channels and speakers.
  • Compatibility: Ambisonics content can be folded down to stereo or mono without losing essential information.

Applications of Ambisonics

Ambisonics is finding increasing use in:

  • Virtual Reality (VR): Providing realistic and head-tracked spatial audio for immersive experiences.
  • 360° Video: Enhancing the sense of presence in 360° videos with spatial audio that adapts to the viewer's head movements.
  • Gaming: Creating more believable and engaging soundscapes in video games.
  • Film and Music Production: Offering new creative possibilities for surround sound mixing and mastering.

SoundField Microphones and the RØDE NT-SF1

Firstly, you need a SoundField microphone with four capsules arranged in a tetrahedral array - such as the RØDE NT-SF1 - and a four-track recorder. By rigging a single SoundField microphone in a sports stadium, for example, you can instantly generate immersive ambient crowd noise without needing multi-mic arrays.

The B-Format signal can be easily manipulated to emulate any type of microphone and polarity - omni, cardioid, sub-cardioid, directional etc. You can then point these ‘virtual’ microphones any direction you choose - to the right and left, to the rear, and even up and down.

It's increasingly important to be able to keep options open for post-production, either because the final project requirements are still evolving, or because of sub-optimal monitoring or set-up flexibility during the recording. Capturing sound using a SoundField microphone - whether the sound on-set on a 360° video shoot or when recording ambience to be added to a video game - allows for a fully head-tracked audio experience. Things to the left of the viewer will sound on the left, until they turn their head, when it will move to the centre as they face it.

The RØDE NT-SF1 is the first broadcast-grade ambisonic microphone made available for under USD$1000. By that we mean the combination of RØDE’s patented half-inch true condenser capsules and precision innovation creates an ambisonic microphone of equivalent quality to microphones several times more expensive. It’s an incredible microphone at any price.

Meanwhile, the SoundField by RØDE Plug-in is a bespoke companion plug-in available for free download on both Windows and Mac and allows users to endlessly reshape their audio. The SoundField by RØDE Plug-in is perfectly matched to the NT-SF1 Microphone and operates in a completely different way to traditional ambisonic processors. Eschewing the matrices and correction filters of previous generations, it utilises state-of-the-art frequency-domain processing to deliver unparalleled spatial accuracy at all frequencies. It's the technology that allows the simulation of shotgun-type microphone patterns derived from an array of closely spaced microphones.

Its quality and unbeatable price make it the perfect ambisonic microphone to have in your kit.

Ambisonics B-Format

In first-order Ambisonics, sound information is encoded into four channels: W, X, Y and Z. This is called Ambisonic B-format. The W channel is the non-directional mono component of the signal, corresponding to the output of an omnidirectional microphone. The X, Y and Z channels are the directional components in three dimensions. So with a four-channel file we have a mono channel and 3 positioning channels.

Channel Description
W Non-directional mono component (omnidirectional microphone)
X Directional component (front-back)
Y Directional component (left-right)
Z Directional component (up-down)