Establishing Standardized Conditions for Sound-Localization Tests: A Multicenter Approach
Sound localization is essential for auditory spatial awareness. The ability to determine the direction of sound is an important aspect of processing auditory stimulus. Localization has been defined as the ability to determine the direction of sound.
The process relies on interaural differences in timing and level, and spectral cues. Two directions are involved in the perception of directionality, i.e., horizontal and vertical, and sound information is required for each perception. In the horizontal direction, the interaural time difference (ITD), which is the difference in time between sounds entering the left and right ears, and the interaural level difference (ILD), which is the difference in sound pressure between the left and right ears, provide important sound information. ITDs and ILDs are mainly used to localize sounds with frequencies below and above 1500 Hz, respectively.
The interaural phase difference (IPD) is an important auditory spatial cue that becomes particularly useful when the ITD and ILD are not sufficiently reliable. On the other hand, even though ITDs and ILDs cannot be utilized in the vertical midplane, people with normal hearing can perceive direction as they can utilize the increase or decrease in frequency caused by changes in the direction of the sound source. When people hear a sound, the spectral component changes are caused by the reverberation effect of the auricle and the head-shadow effect.
Because a change in the spectral component depends on the direction of the sound source, directional perception can be achieved by detecting these changes. This change in sound frequency that can be used for directional perception is called a spectral cue, which is the sound information needed not only for vertical perception but also for horizontal perception. In general, sound-localization test results are worse in patients with unilateral hearing loss (UHL) than in healthy participants, and directional perception (i.e., the ability to recognize the direction of a sound source) on the side of the hearing-impaired ear is worse than that on the side of the normal-hearing ear.
Previous researchers have used a variety of test stimuli, test environments, loudspeaker arrays, and ages and numbers of subjects to measure the ability to localize sounds. Despite the obvious need for individuals to identify the specific location of a sound source, the variety of approaches suggests that there is no standard process for measuring localization abilities. As there is no gold standard for localization assessment, it is difficult to compare previous studies of localization.
Study Overview
This study aimed to standardize sound-localization testing conditions across facilities in Japan, analyze the impact of early reflected sounds on localization accuracy, and compare outcomes between individuals with normal hearing and those with unilateral hearing loss.
Participants:
- 77 participants with normal hearing (mean age: 36.5 years; range, 20-68 years)
- 45 patients with UHL (mean age: 57.4 years; range, 20-75 years)
Inclusion Criteria:
- Normal Hearing Group: Average hearing level of 25 dB HL or less at four frequencies (500, 1000, 2000, and 4000 Hz) in both ears.
- UHL Group: Average hearing level of 70 dB HL or greater in the worse-hearing ear and 40 dB HL or less in the better-hearing ear.
- Mixed Hearing Loss: Patients with a mean bone-conducted hearing level of 55 dB or greater in the worse-hearing ear were included.
This study was conducted from August 2023 to September 2024. Seventy-seven patients, including seven volunteers from each of the eleven university hospitals involved in the study, were included in a normal-hearing control group. In addition, 45 patients with UHL, defined as an average hearing level of 70 dB HL or greater in the worse-hearing ear and 40 dB HL or less in the better-hearing ear, were included in the study.
Both groups (control and UHL) underwent sound-localization testing under the same acoustic conditions and procedures. IBM SPSS Statistics software (version 29; IBM Corp., Armonk, NY, USA) was used for the statistical analyses. The normality distribution of the analyzed data was evaluated using a Q-Q plot. The observed data points aligned closely with the theoretical reference line, indicating that the data conformed to a normal distribution, confirming the appropriateness of statistical methods that assumed normality in subsequent analyses. The significance level was set at 5%. This study was approved by the Ethics Review Committee of Hiroshima University, Hiroshima, Japan (No. E2022-0269). The Central Ethics Review Committee of Hiroshima University approved the study at other sites. The mean hearing level of the control participants in whom pure-tone audiometry was performed was 9.5 dB HL.
Forty male and 40 female adults, ages 21 to 60 years with normal hearing sensitivity and normal temporal processing abilities, will be used as subjects in this study. All testing will be completed in an IAC sound treated room, using eight sound field speakers. Each speaker will be arranged symmetrically on the wall, positioned within the horizontal plane with 45 degree intervals between each. The Central Institute for the Deaf Everyday Sentences (Alpiner & Schow, 2000; Healy & Montgomery, 2006) will be used as test stimuli. Five test conditions, one quiet and four noisy listening conditions, will be used to identify elicit localization.
Methodology
Sound-localization tests were conducted in various environments at 11 facilities. Room sizes, floors, ceilings, wall materials, and door materials were reported for all tested environments. One facility used an anechoic room and the other 10 used soundproof or semi-soundproof rooms. The room sizes ranged from 240 to 453 cm (width), 211-327 cm (height), and 207-512 cm (depth). In the anechoic room, all room surfaces were sound absorbing, and no windows were present.
In the remaining facilities, nine used carpets as flooring material and one used wooden flooring. Only four facilities had sound-absorbing materials on their ceilings. Eight facilities had sound-absorbing materials on their sides and two had concrete sides. Sound-absorbing sponges were attached to the concrete walls and metal parts as a countermeasure against the reflected sound. Five facilities reported having metallic door sections.
Sounds were presented as time-stretched pulses (TSP), which are commonly used in impulse-response measurements in acoustic rooms. The TSP is a sweep sound that can be used to calculate the impulse response in an acoustic room by convolving the reverse TSP signal with the recorded signal. For the measurement of the impulse response in the test room in this experiment, TSPs were presented at 70 dB SPL of 0.5-s duration, using a loudspeaker (SRS-XB01, Sony, Tokyo, Japan) from positions at 100 cm, from channel (ch) 1-9 (Fig 1). They were recorded three times in the participant’s position (110 cm from the floor).
For each loudspeaker position (channels 1-9), TSPs were presented three times, and the recording with the least background noise among the three trials was selected for the evaluation of the impulse response. A sound-level meter (NA-28; Rion, Tokyo, Japan) was used for the measurements. Thus, the measured impulse-response duration at 21 ms had a direct sound arising from approximately 3 ms (approximately 100 cm from the speaker to the sound-level meter microphone), and its peak value was normalized as 1 and included reverberation and reflected components that appeared as fluctuations within the waveform envelope.
For each facility, the response time was calculated considering the room size and the margin of error of ± 5 cm for both the sound-level meter and the reflected sound measured from the floor and ceiling, with a corresponding time range identified within the envelopes. Fig 2b shows a representative example of a waveform segment extracted using a 4-7 ms window, which was one of the time ranges used to quantify the envelope area and peak amplitude of the reflected sound.
The results were divided into four patterns based on the presence or absence of reflected sound effects from the floor and ceiling (i.e., from the floor only, from the ceiling only, and from both the floor and ceiling). Five combinations of patterns were also used for the loudspeakers, with two, four, six, eight, and ten loudspeakers arranged in symmetrical positions. Correlations among the mean number of correct responses, area of the envelope, and peak values of the reflected sound were analyzed for these 720 patterns.
Furthermore, correlations between the left and right differences in the number of correct responses, the area of the envelope, and the peak values of the reflected sound were analyzed for 480 patterns. The same conditions were used for the presence or absence of reflected sound effects from the ceiling and floor.
The participant was seated in front of nine loudspeakers (6301NX, Fostex, Foster, Tokyo, Japan) placed at 22.5° in a semi-circle with a 1 m radius (−90° to 90° azimuth) (Fig 1). The following instructions were provided before the test, to prevent head movement by the participant:
- Confirm the positions and numbers of speakers in advance.
- Keep the eyes fixed on the speaker directly in front of you.
- Keep the head and neck fixed in the direction of the speaker directly in front of you.
The test was performed as described previously. The stimulus was a 1-s Comité Consultatif International Téléphonique et Télégraphique (CCITT) noise or a low-pass CCITT noise burst with a 100-ms rise/fall time in both cases (Fig 3a). The low-pass CCITT noise (Fig 3b) was created using the Audacity software, which involved filtering the CCITT noise to attenuate it by 48 dB per octave above 1500 Hz. Audacity is a freely available open-source audio recording and editing software that functions across multiple platforms.
The CCITT noise and low-pass CCITT noise conditions were tested in separate sessions. Participants were not exposed to randomized interleaving of different noise types within the same trial block, thereby avoiding uncertainty in spectral content during testing. The stimulus levels were randomly set (50, 55, and 60 dB SPL). A total of 27 unique stimuli were created by combining three stimulus levels (50, 55, and 60 dB SPL) with nine loudspeaker positions.
Each condition was presented twice in succession as a single trial, and participants responded once per trial. In total, 27 trials were conducted in random order. The tests were presented as source-identification tasks. The loudspeakers were consecutively numbered from 1 to 9 (−90° to 90° azimuth), and the participant had to identify the loudspeaker considered to be the source of the stimulus by these numbers. During the testing, no feedback was provided.
The localization accuracy was quantified using the mean deviation score (d) and the bias score (b). The mean deviation score (d) indicates the deviation between the judged azimuth and the sound presentation azimuth with and without bias adjustment, where the bias is the localization error, which is constant across the loudspeakers. Root mean square (RMS) is a statistical metric used to evaluate the precision or variation in directional perception. It is defined as the square root of the mean of the squared differences between the perceived and actual directions.

Auditory Localization Diagram
Key Findings
The number of correct responses per loudspeaker (maximum 21: one response per trial × three sound pressure × seven cases at each facility) for the sound-localization test is shown for each of the 11 facilities (Fig 4a). The number of correct responses was highest at the 0-degree loudspeaker and gradually decreased as the stimulus location moved toward ±90 degrees. However, the downward trend in the number of correct answers varied from facility to facility. Additionally, Fig 4b shows the difference in the number of correct responses for loudspeakers in symmetrical positions for each facility.
In the analysis of the mean number of correct responses and the area of the envelope, all values showed a negative correlation after 4 ms, and when the negative correlation coefficients were ranked in order from largest to smallest, seven of the bottom 10 patterns were in the range of 4-7 ms after the influence of the floor and ceiling was removed (r = −0.535 to −0.555). A negative correlation was also observed between the peak values of the reflected sound and the average number of correct responses after 4 ms.
There was a positive correlation between the envelope area and the difference in the number of correct responses between the left and right sides. In particular, the difference in the number of correct responses between ch 3 and ch 7-corresponding to symmetrically positioned speakers at −45° and 45°, respectively-showed a strong correlation of r > 0.6 in 79-88% of the items in the envelope area and the 4-10-ms extraction section, without being related to the reflection influence of the floor and ceiling.
Additionally, in terms of the relationship between the peak value of the acoustic waveform and the difference in the number of correct answers between the left and right sides, 10 items in ch 3-7 (−45° and 45°), 10 items in ch 2-8 (−67.5° and 67.5°), and two items in ch 1-9 (−90° and 90°) yielded a correlation coefficient of r > 0.6 in the 4-9-ms extraction section, regardless of the reflected sound from the floor and ceiling.
A strong correlation was observed between the area and the peak value of the reflected sound in the range of 4-7 ms for sounds presented by ch 1 (−90°), 2 (−67.5°), 8 (67.5°), and 9 (90°) speakers (r = 0.636-0.947).
Key Results:
- Localization performance was negatively influenced by early reflections.
- Reflected sound envelope area and peak values within 4-7 ms correlated significantly with reduced accuracy (r = −0.535 to −0.555).
- Participants with normal hearing achieved a root-mean-square error of 2.0° ± 4.8°, whereas participants with unilateral hearing loss exhibited significantly greater errors (68.4° ± 40.7°, p < .001).
- Asymmetries in the left-right response accuracy correlated positively with the reflected sound characteristics (r > 0.6).
- Noise type (normal vs. low-pass CCITT) did not significantly impact performance in either group.
Implications for Testing
Early reflections significantly compromise sound-localization accuracy, particularly in smaller testing environments where reflections overlap with direct sounds. Standardized testing protocols, in which early reflections are controlled, are critical for reliable assessments. The use of sound-absorbing materials can enhance the test precision, particularly in the clinical evaluation of unilateral hearing loss.
| Group | RMS Error |
|---|---|
| Normal Hearing | 2.0° ± 4.8° |
| Unilateral Hearing Loss | 68.4° ± 40.7° |