Virtual Auditory Evaluation Techniques: Exploring Sound Localization in Virtual Reality
Identifying the direction of sounds is crucial for perceiving and interacting with the surrounding world. The human brain extracts spatial information about the surrounding environment by interpreting auditory cues, which result from the interaction of sound waves reaching the ears with the head and pinnae.
In our daily lives, we often have multisensory access to sound sources, allowing us to perceive their positions and interact with them using our bodies. For example, we can physically move closer to or further away from a sound source and manipulate sound sources, such as grabbing a phone and moving it toward our ears. This multisensory experience of the acoustic space and our ability to actively interact with sound sources for localization purposes have prompted researchers to investigate different response methods in sound localization tasks and explore the role of multisensory information in auditory space adaptation and learning processes.
Interestingly, recent studies have utilized multisensory information to develop effective training approaches for improving sound localization abilities in the context of altered hearing experience. Some training methods involve providing visual feedback about source positions, while others leverage multisensory information and active interaction with sound sources. Several studies have demonstrated the benefits of reaching movements and head movements in enhancing sound localization with modified auditory cues.

Figure 1 Experimental Timeline and Procedure. Schematic description of the experimental timeline. The first step was signing the consent and taking part in the audiometric examination. Then they performed a sound localization task. The task was composed of 4 consecutive blocks performed in different hearing conditions. Participants were divided in 3 groups, which were instructed to either naming, pointing or reaching the sound sources to localize them. Finally, participants were invited to answer some questions. We insert in the figure, three...
In the last decade, studying the contributions of multisensory and motor cues in this research field was facilitated by the increased use of new technology based on virtual reality. Before that, sound localization has often been studied by adopting experimental settings comprising several speakers located around participants. More recently, sound localization has increasingly been investigated through a fully-virtual approach. In several studies, researchers have exploited acoustic virtual reality and presented spatialized sound through headphones.
This is possible thanks to the use of Head-Related Transfer Functions (HRTF), which characterize the spectro-temporal filtering of a series of source positions around a given head. HRTFs can be measured for a given listener, which is often time-consuming and relatively costly. Non-individual or “generic” HRTFs have also successfully been used in the past to simulate binaural listening. Headphone-based virtual audio has been successfully used to train sound localization skills, as well as to simulate hearing deficits in more complex and controllable settings if compared to techniques that have been typically adopted in previous works, such as monaural ear-plugs or ear molds.
The Role of Reaching Movements in Sound Localization
In a recent study testing normal hearing people in altered listening condition (i.e., one ear plugged), it has been demonstrated that reaching to sounds reduced localization errors faster and to a greater extent as compared to just naming sources’ positions, despite the fact that, in both tasks, participants received feedback about the correct position of sound sources in case of wrong response. However, exactly which aspects have made reaching to sound more effective as compared to naming remained an open question. The rationale behind the reaching-to-sounds benefit hypothesis is that reaching to sounds requires coordinating different effectors (eyes, head, hand) into a common reference frame.
In turn, this may result in a more stable (or salient) spatial coding of sound source location and favor the learning of the association between auditory cues and spatial coordinates. Therefore, following this speculation, the key feature of the reaching-to-sounds benefit may be related to the physical movement of the body (the hand) toward the target space. In order to test this hypothesis, it is necessary to compare reaching-to-sounds action to a condition in which the hand is the effector providing the response, but without it reaching toward the space occupied by the target source.
For instance, pointing (i.e. indicating a far target) toward a direction requires implementing a motor action with the hand to select a certain distal location of the stimulus (as the action of reaching requires), but without actually reaching the target position. Precisely, when pointing toward a sound source, the target object is not coded as a function of hand position inside the peripersonal space. On the contrary, it is coded only with respect to trunk-centered coordinates without necessarily involving a remap of the multisensory spatial representation.
Moreover, it is important to note that motor strategies involving the head implemented by the participants during sound localization tasks could favor the adaptation to an altered hearing experience. Indeed, it has been observed that the reaching-to-sounds benefits were linked to progressively wider head movements to explore auditory space. The crucial role played by head movements in sound localization has been shown by several works in literature in the last decades. The studies of these motor behavioral strategies have been favored by advanced experimental settings permitting the possibilities for active listening with and without involving acoustic virtual reality.
A further aspect that may contribute to the reaching-to-sounds benefit concerns the auditory adaptation mechanisms subtending this effect. Different types of interaction with sound sources could lead to either transient or stable adaptations to altered auditory cues. For instance, during the course of the task, participants might learn that sounds reach the impaired ear with lower intensity and perform the task accordingly, by re-weighting the altered binaural cues. Alternatively, they could succeed in creating a new complex map of the space by learning new correspondences between the auditory cues available and the external space coordinates.

Testing participants after they were exposed to altered auditory cues to measure whether or not their performance has remained anchored to previous experience (i.e. after effect) could be a first attempt to investigate the adaptation mechanisms, although still explorative and little adopted for stationary sounds as far as we know.
Experimental Study: Naming, Pointing, and Reaching
In the present study, we addressed this hypothesis directly by testing three groups of normal hearing participants each while performing sound localization and we measured their performance and head movements during listening. The task comprised 4 blocks: block 1 and 4 were performed while listening in normal hearing condition and block 2 and 3 in altered listening situations simulating a mild-moderate unilateral hearing loss through auditory virtual reality technology. We instructed participants to either name the label positioned above the speaker (naming group), select the speaker by pointing it with a laser pointer held in their hands (pointing group), or reach the target speaker by moving their hand holding the controller (reaching group). All three groups were provided with audio-visual feedback about sound position only in the case of a wrong response. For block 4, they received no feedback on their responses.
To summarize, the goal of this study is to further explore the contribution of reaching to sound sources when adapting to altered auditory cues. To investigate this research question, we tested whether, and to which extent, sound localization improves across block repetition during simulated asymmetrical hearing loss as a function of instruction (naming, pointing, reaching). We predicted a reduction in localization errors across altered listening blocks for all groups. Considering error reduction in the Pointing group, we hypothesized that if hand involvement in the interaction is relevant for adaptation, the improvement in the Pointing and Reaching groups would be similar.
We explored the impact of instructions (naming, pointing, reaching) in relation to the implementation of head movement strategies. We further investigated the effect of instruction (naming, pointing, reaching) on sound localization learning by testing participants in binaural conditions after having experienced simulated asymmetrical hearing loss. We hypothesize that if reaching towards sound sources (or pointing or naming them) lead to a stable change in the acoustic space processing, we should not observe any bias resulting from compensatory response behaviors with respect to any systematic adjustments introduced by the participants during the altered listening condition.
Forty-two participants (age: 22.83 ± 2.49, range = [18-28]) took part in the experiment, including 22 males and 20 females. There were 5 left-handed attendees (2 in the pointing group and 3 in the naming group). All methods were performed in accordance with the Declaration of Helsinki (1964, amended in 2013) and participants provided informed consent. The experimental protocol was also accepted by the ethical committee of the University of Trento (protocol: 2022-009). Before proceeding with the experimental task, participants stated that they had no visual or motor deficits. In addition, to exclude hearing deficits, the hearing threshold was recorded by an audiometer (Grason Stadler GSI 17 Audiometer) and different frequencies (250, 500, 1000, 2000, 4000 Hz) were tested separately for both ears. The average threshold among participants was 3.01 ± 3.88 dB HL.
Apparatus and Stimuli
The tools used were a Head-Mounted Display (HMD, Oculus Quest 2), its controller and headphones (over ear headphones Sennheiser HD 650 S, HiFi, frequency range: 10-41.000 Hz). The playback level was calibrated in order to deliver a signal at approximately 60 dB SPL A-weighted measured at the listener’s ears. The experiment took place in a soundproofed and partially anechoic booth (Amplifon G2 × 2.5; floor area = 200 × 250 cm, height = 220 cm; background noise level of the booth during the task: 25-30 dB SPL A-weighted).
The virtual scenario was a square room, developed with platform Unity3D (Unity Technologies, San Francisco, CA), similar in size to the real one. The room was empty, with a door behind participants and designed with exposed bricks. Two lines drawn along the middle of each wall, including the floor and ceiling helped participants to remain in the center of the room during the experiment. Participants saw 17 speakers distributed in a semicircle in front of them at the ears level, at 55 cm from participants’ head and spaced about ± 80° of visual angle (each speaker was positioned 10° apart from the other). Above each speaker, a numerical label from 1 to 17, was located. These labels changed randomly on trial-by-trial basis, to avoid anchoring of responses to the previously seen label, to favor orienting of attention to the location of the new label and to avoid stereotypical responses.
Sound spatialization was performed using the Unity integration of a convolution-based algorithm named 3D Tune-In Toolkit. Sounds were spatialized using the HRTF of a KEMAR dummy head mannequin from the SADIE database, and Interaural Time Differences (ITDs) were customized for each individual user according to their head circumference. In order to simulate nearfield sound sources (i.e. closer than the distance at which the HRTF was measured), the Interaural Level Differences (ILDs) were corrected using a spherical head model, accounting also for the acoustic parallax effect. The acoustic environment (a small room) was simulated using the Reverberant Virtual Loudspeakers (RVL) technique, as implemented in the 3D Tune-In Toolkit and recently assessed in a perceptual evaluation study. The auditory stimulus consisted of a white noise, modulated in amplitude (2 Hz), which was transmitted using headphones and spatialized as it was emitted by one of the 17 loudspeakers. Note that head movements were permitted during the sound emission and that the acoustic stimulus changed coherently with participants’ head movements (i.e. turning their heads or approaching speakers). During the task, two different listening conditions were implemented: binaural or altered. The latter consisted of the simulation of mild-to-moderate unilateral hearing loss, which was obtained again using the 3D Tune-In Toolkit. Signals were processed using a gamma tone-based multiband compressor, emulating a hearing loss of 30 dB HL at 250 Hz, decreasing to 50 dB HL above 1 kHz.
The experiment was controlled by a custom patch built using MaxMSP 8 software (www.cycling74.com). In the Naming condition, the experimenter entered manually the label pronounced by participants corresponding to the speaker from which they thought the sound was emitted whereas in the Reaching and Naming conditions this information was automatically recorded by the software.
Procedure
Participants, after signing the consent and taking part in the audiometric examination, performed a sound localization task. They were invited to sit on a chair placed in the center of the room, the circumference of their head at the ears plane was measured using a seamstress meter and inserted in the app’s interface, in order to customize the ITDs. Then, they were instructed to wear the head-mounted display and hold the controller in their right hand (note that even the left-handed participants (N = 5) were instructed to use their right hand). All participants were instructed to perform a single sounds localization task following different instructions as a function of their group. Specifically, the sample was divided into 3 groups of 14 people (Fig. 1). The Naming group was instructed to localize sound sources by naming the numerical label located over the speakers. The Pointing group was instructed to localize sound by pointing toward the source by directing toward it a laser pointer emanating from the controller. Finally, the Reaching group was instructed to reach the speaker using the controller to localize it (i.e. participants extend their arm and hand holding the controller to move it to the perceived source).