Distance Dependent Head-related Transfer Function Database of KEMARTianshu Qu 1 , Zheng Xiao 1 , Mei Gong 2 , Ying Huang 1 , Xiaodong Li 2 , Xihong Wu 11 Key Laboratory of Machine Perception (Ministry of Education), Peking University2 Institute of Acoustics, Chinese Academy of Scienceswxh@cis.pku.edu.cnAbstractThe measurement and structure of a database ofdistance-dependent head-related transfer function isintroduced in this paper. This database was setup bymeasuring a high spatial resolution head-relatedtransfer function at a total of 6344 space points, withdistance from 20 to 160 cm, elevation from -40 to 90degrees, and azimuth from 0 to 360 degrees. Thedatabase’s reliability was confirmed by the object andsubject evaluations.1. IntroductionHead-Related Transfer Function (HRTF) describesthe sound transmission characteristics from a soundsource to eardrum in free field. It is widely applied toroom acoustics simulation, three-dimensional (3D)sound visualization, and sound localization in virtualreality technologies. Usually, the most accurate anddirect way to obtain HRTF up till now is experimentalmeasurement. In the past 15 years, there have beenseveral distal region (source location is 1 m or fartherfrom listener) HRTF measurement related researches,which using dummy heads or human beings as subjects[1-4]. HRTF measurements have also been done insome proximal region recently. In 1998, R. O. Dudaused the Boss Acoustimas loudspeaker and Golay codemethod to measure proximal region HRTFs of a spherewhich simulated the human being’s head , whileCalamia used the Tannoy loudspeaker and MLSmethod to measure proximal region HRTFs ofKEMAR . However, the loudspeaker cannot betaken as an acoustic point source in proximal regionHRTF measurement, because its size and lowfrequencyspectral response are always dilemma.Therefore, it is necessary to find some sound sourcesuitable for the proximal region HRTF measurement.In 1999, an approximating acoustic point sourcecombined by an electrodynamic horn driver and a longsection of Tygon tubing was adopted in the proximalregion HRTF measurement by MIT Research Lab. ofElectronics . In 2004 and 2006, the spark noise and a dodecahedral loudspeaker  were used forproximal region HRTF measurement by Itakura Lab.,Nagoya University, and some of their measurementdata were published .The acoustic point source has always been adifficult problem to be solved in the proximal regionHRTF measurement and related researches. Asmentioned above, loudspeakers used in Duda andCalamia researches cannot be considered as an acousticpoint source. While for specialized approximatingacoustic point sources in MIT and Itakura researches,from Fig.1 of  and Fig.2 of , it was obvious thelocal SNR in several frequency parts were too low,which may affect the accuracy of the measured data.In this research, a spark impulse generator wasadopted as the acoustic point source, which canovercome the above problems of the traditionalsources. Using the spark impulse generator and aspecialized measurement system, a distance-dependentHRTF database of KEMAR was set up in an anechoicchamber. This measurement was carried out at eightdistances which are 20, 30, 40, 50, 75, 100, 130 and160 cm. By adjusting azimuth and elevation, 793locations at each distance and totally 6344 spatialpoints HRTFs were measured.2. Measurement procedureThe schematic diagram of the measurement systemis shown in Fig. 1. The measurement was conducted inan anechoic chamber, the subject was KEMAR. ‘1’represents the spark generator, the sound source; ‘2’represents the reference microphone, which was usedto record the sound source signal; ‘3’ is a “L” shapedpole which can be moved up and down; ‘4’ is aslidable pole which can be moved forward andbackward; ‘5’ is a fixed rail and ‘6’ is rotatable platethe horizontal plane.978-1-4244-1724-7/08/$25.00 ©2008 IEEE 466ICALIP2008Authorized licensed use limited to: Peking University. Downloaded on December 4, 2008 at 05:07 from IEEE Xplore. Restrictions apply.
duration is short, which makes it easier to attenuate theeffect of environment reflection by truncating therecorded signal. In the frequency domain, thefrequency response is nearly flat from 200 Hz to 30kHz, which confirms that the sound source signal haveenough energy at these frequencies.Fig. 1 Schematic diagram of the measurementsystem2.1 Sound sourceFig. 2 Spark impulse generator(Type: BDMS1-040528)Fig. 3 Waveforms of the spark sound sourcesignal in time domain (upper) and frequencydomain (lower)In this measurement, a specialized spark impulsegenerator (Type BDMS1-040528, developed byShanghai Youle Electric Co., Ltd and ArchitecturalPhysics Lab., Tsinghua University, see Fig. 2) wasadopted as the approximate acoustic point source. Thehead of the spark is 11.3 mm high and 10.2 mm width.The nearest distance between the center of theKEMAR dummy-head and the sound source in thismeasurement was 20 cm, in which situation the soundsource can be taken as a point source from the view ofvolume.Fig. 3 shows the time-domain waveform and thefrequency response of the sound source signal emittedby the spark generator. It is obvious that the sparkimpulse generator performs well in both the timedomain and the frequency domain. In time domain, the2.2 Measurement environmentThe measurements were made in the anechoicchamber at the Institute of Acoustics, ChineseAcademy of Sciences, whose cutoff frequency is 70 Hzand available size is 6.5m×4.8m×3.2m. The soundsource is BDMS1-040528 spark impulse generatorwhich has been described above. The KEMAR wasKnowles Electronics model DB-4004 and wasconfigured with two neck rings and a torso. TheKEMAR was equipped with two GRAS 40AGmicrophones at the position of two ears, 26ACpreamplifiers and DB-100 occluded ear simulator withDB-050 ear canal extensions. Two different pinnae ofthe KEMAR were used, the left one was DB-066 andthe right one was DB-060. Because of the symmetry ofthe KMEAR, two complete sets of HRTFs can beobtained at the same time by sampling the entiresphere. The KEMAR was mounted upright on arotatable plate which was fixed in the center of theanechoic chamber and could rotate in the horizontalplane. A special setting, which comprised a fixed rail(be named as x axis) and a slidable pole (be named as yaxis) with a slidable ‘L’ shaped pole, both with 2 mmminimum scale, was developed to mount the head ofthe spark impulse generator at any elevation anddistance accurately (see Fig.1). As the amplitude ofimpulses emitted by the spark generator were differentevery time, a G.R.A.S. 40AG microphone (be namedas reference microphone) with 26AC preamplifier wasplaced 10 cm below the head of the spark generator torecord the sound source signal. The outputs of themicrophone preamplifiers were connected to the B&KPULSE 3560C data acquisition which can record thesignals in the three channels simultaneously at 65536Hz sample rate. During the experiment, the turntableand the special setting were covered by the soundabsorptive plates to abate their influence on themeasurements results.2.3 Measurement positionsThe horizontal plane is defined as the plane passedthrough the axis cross the two ears. The azimuth anglescorrespond to the follows: the front is 0 degrees, theright is 90 degrees, the back is 180 degrees and the leftis 270 degrees. The elevation angle of the horizontal467Authorized licensed use limited to: Peking University. Downloaded on December 4, 2008 at 05:07 from IEEE Xplore. Restrictions apply.
plane is 0 degrees. Negative elevation means that thesound source position was below the horizontal plane,positive denotes it was above that plane and the pointdirectly overhead was the 90 degrees elevation. Thedistance is defined from the midpoint of axis cross twoears of the dummy-head to the head of the sparkgenerator.This measurement was carried out at eight distanceswhich were 20, 30, 40, 50, 75, 100, 130 and 160 cmrespectively. Within proximal region, the interval issmaller because the variance of HRTFs in this area ismuch more dramatically than that in the distal region.At each distance, the spherical space around theKEMAR was sampled at elevations from -40 to 90degrees with 10 degrees interval. At each elevation, afull 360 degrees of azimuth was sampled in equal sizedincrements. Table 1 shows azimuth increment step(Step) and the number of samples (Num.) at everyelevation. Therefore at each distance, 793 locationsHRTFs were measured and 6344 HRTFs weremeasured totally.Table 1 The measurement positions2.4 Experimental procedureThe measurement procedure is described in Fig. 4.Measuring the impulse response of this system yieldsthe impulse response of the combined systemconsisting of the impulse generator (BDMS1-040528),the G.r.a.s. 40AG microphone with 26 ACpreamplifiers, B&K PULSE 3560C data acquisitionsystem, the anechoic chamber where the measurementswere made and most importantly, the response of theKEMAR with its accessories. Most room reflects wereavoid by the anechoic chamber and sponge plateswhich covered on the measurement setting. Theremainder of the room reflects were avoided byensuring that any reflections occur well after the headresponse time. The non-uniform spectrum of the pulsegenerator was compensated by using the signals, whichwere recorded by the left ear and the right earmicrophones, dividing the signal recorded by thereference microphone. The frequency responses of thethree microphones employed were almost identical.Fig. 4 HRTRs database generation procedureThe measurement was conducted in the order ofdistance, elevation and azimuth. That means, at first,HRTFs was measured at all azimuths with a fixedelevation and a fixed distance, which named as anazimuth-set. After that, the elevation was changed andanother azimuth-set of HRTFs were measured. Afterall the HRTFs at that distance have been measured, thedistance was changed and all the steps above wererepeated until all the distances were accomplished. Themeasurement procedure was described in detail asfollows:Given the elevation and distance, the coordinate ofthe sound source in the x axis and y axis could becalculated and the head of the spark impulse generatorwas set to the proper position. Prior to each azimuthsetmeasurement, the 0 degrees azimuth was confirmedby adjusting KEMAR to the position where the timedelays from the sound source to each ear were same.After the 0 degrees azimuth was marked, the HRTFswere measured by rotating KEMAR to each azimuthdegree. At each position, the spark generator emitted 5impulses continuously and the three microphonesrecorded simultaneously. These three recorded signalswere divided into 5 sets as the following two steps.First, found five maximum values of the referencechannel signal, each corresponds to every impulse.Second, the first maximum value sample was pickedout and the sample which was 64 samples earlier thanit was referred as the beginning point, the 1024 pointsfollowing the beginning sample were reserved as thesound source signal. The signals at the same time inthe other two corresponding channels were reserved asthe left ear signal and the right ear signal respectively.These three signals were viewed as a set. The sameprocessing was executed to the other four maximumvalues. Then the five sets of signals were used tocompute the HRTFs and the Head-Related ImpulseResponses (HRIRs) which are the inverse Fouriertransform of HRTFs. The reason of choosing 1024samples as the length of the signals (whose duration isabout 15 ms) is that such duration can ensure thereflection and diffraction signals, caused by the head,ears and torso, are reserved.468Authorized licensed use limited to: Peking University. Downloaded on December 4, 2008 at 05:07 from IEEE Xplore. Restrictions apply.
3. The database structureThe HRIRs were stored in the database in doubletype and they were divided into two directoriesaccording to the type of ear. One sub-directory is nameas “DB-060”, the other is named as “DB-066”. In eachsub-directory, the data were packed up in directories bydistance first and the directories are named as distancedirectories. Each distance directory name had theformat “distDDD”, where DDD, a double-digit or atriple-digit, was the distance in cm, from 20 to 160.Within each distance directory, the data were packedup in directories by elevation and the directories arenamed as elevation directories. Each elevationdirectory name had the format “elevEE”, where EE, aone-digit, a double-digit or a triple-digit, was theelevation angle of the source in degrees, from -40 to90. Within such directories, each filename had theformat “aziAAA_elevEE_distDDD.dat”, where AAA,a one-digit, double-digit or a triple-digit, was theazimuth angle of the source in degrees, from 0 to 355.Each file contains 2048 double data, the first 1024 datawere the HRIR of the left ear and the next 1024 datawere the HRIR of the right ear.The distance related HRTF database are availablefrom authors. Anyone who is interested in this datasetcould connect with the authors.4. Results evaluationThe measurement results were compared to theCIPIC database  by using a spectral distortion (SD)score as an objective measure, which is given by:N1 ⎛ Hs( f ) ⎞iSD = ∑ 20log10N ⎜i=0 Ht( fi)⎟(1)⎝⎠where H s is the magnitude response of ourmeasurement result, H t is the magnitude response ofthe HRTFs from the CIPIC database at the sameposition according to H s .The magnitude responses of our measured HRTFsof the normal ear model (DB-060) at 100 cm, 0 degreeselevation were compared to that in the CIPICdatabases. The results were shown in Fig. 5.PKU&IOA represents the measured HRTFs in thisstudy. The two sets of data are very similar below 17kHz. The average SD over different azimuths in thehorizontal plane was 5.5 dB and the variation is 0.98dB.2Fig. 5 HRTFs comparison between ourmeasurement and CIPIC4.1 Subjective evaluationSix subjects with normal and balanced pure-tonehearing threshold took part in the subjective soundlocalization tests for azimuth, elevation and distance.Drumbeats (2.4s duration, 44.1kHz sampling rate)were used as stimuli, which were transduced byearphones (Sennheiser HD600) at a level of 66 dBA. Inthe azimuth test, the HRTFs measured at 50 cmdistance and 0 degrees elevation were used. There were12 target azimuths in this test, from 0 to 330 degrees instep of 30 degrees. In the elevation test, the HRTFsmeasured at 50 cm distance and 90 degrees azimuthwere used. There were 5 target elevations in this test,from –30 degrees to 90 degrees in steps of 30 degrees.In the distance test, the HRTFs measured at 0 degreesazimuth and 0 degrees elevation were used. There were4 target distances in this part of measurement, 20 cm,50 cm, 100 cm and 160 cm. The order of the azimuth,elevation and distance measurements was balancedamong the 6 subjects according the Latin squaredesign. Five times were repeated for each targetdirection. So there were 60 sounds in the azimuth test,25 sounds in the elevation test and 20 sounds in thedistance test. In each type of test, the sounds werepresented randomly to subjects. And subjects wereasked to choose one of the target directions after eachsound was presented. Practice with feedback was givento each subject before each type of test. HRTFs used inthe practice were not used in the tests.The results of sound localization tests for azimuth,elevation and distance are shown in figure 6, 7, 8. Thearea of the cycle in each figure corresponds to thenumber of answers for the six subjects. The resultsindicate the measured HRTFs in this study areeffective in 3D perception.469Authorized licensed use limited to: Peking University. Downloaded on December 4, 2008 at 05:07 from IEEE Xplore. Restrictions apply.
Century Training Programme Foundation for theTalents by the State Education Commission.7. References W.G.William and K.D. Martin, “HRTF measurementsof a KEMAR,” J. Acoust. Soc. Am., vol. 97, pp. 3907–3908, 1995.Fig. 6 Results of azimulth localization Algazi A.R. and R.O. Duda, “The CIPIC hrtfdatabase,” in Proceedings of 2001 IEEE Workshop onApplications of Signal Processing to Audio andAcoustics, 2001, pp. 99–102. Cheng C.I., Visualization, Measurement, andInterpolation of Head-Related Transfer Functions(HRTFs) with Applications in Electro- Acoustic Music,Ph.D. thesis, University of Michigan, 2001.Fig. 7 Results of elevation localization Grassi E., Tulsi J., and Shamma S., “Measurement ofheadrelated transfer functions based on the empiricaltransfer function estimate,” in Proceedings of the 2003International Conference on Auditory Display, 2003,pp. 119–122. W.L. Martens R.O. Duda, “Range dependence of theresponse of a spherical head model,” J. Acoust. Soc.Am., vol. 104, pp. 3048–3058, 1998. Calamia P.T., “Three-dimensional localization of aclose-range acoustic source using binaural cues,” M.S.thesis, University of Texas at Austin, 1998.Fig. 8 Results of distance localization5. ConclusionsIn this study, a high space resolution KEMARHRTFs database was accomplished which wasmeasured at 6344 different locations with distancefrom 20 to 160 cm. The measurement results wereproved to be credible either in objective evaluation orsubjective evaluation. To our knowledge, this databaseis one of the most integrated HRTF databases ofKEMAR and can be employed in many related fieldssuch as three-dimensional (3D) sound visualization,sound localization and virtual reality technologies. D.S. Brungart and W.M. Rabinowitz, “Auditorylocalization of nearby sources. head-related transferfunctions,” J. Acoust. Soc. Am., vol. 106, pp. 1465–1479, 1999. K. Takeda T. Nishino, S. Hosoe and F. Itakura,“Measurement of the head related transfer functionusing the spark noise,” in Proc. of ICA2004, 2004, pp.1437–1438. S. Hosoe, T. Nishino, K. Itou, and K. Takeda,“Development of micro-dodecahedral loudspeaker formeasuring head related transfer functions in theproximal region,” in Proc. of ICASSP2006, 2006, pp.329–332. http://www.itakura.nuee.nagoya-u.ac.jp/HRTF/6. AcknowledgementsThis work was supported by the National NaturalScience Foundation of China (60305004, 60435010,60535030, 60605016), the National High TechnologyResearch and Development Program of China(2006AA01Z196, 2006AA010103), and the Trans-470Authorized licensed use limited to: Peking University. Downloaded on December 4, 2008 at 05:07 from IEEE Xplore. Restrictions apply.