Acoustic Object Localization

For the audio localization, recordings from Zurich have been evaluated. Due to the special evaluation scheme, only hit and miss rates are available. The evaluation results in detail can be found in the table below. For 87,7% of the 3756 frames, a localized position matches the annotated position of a speaking person. 

Zurich data set:
hits
 misses
 3297 459

 

Audio-visual Recording FALSE CORRECT
22_17Mar2010_Zurich_living_lab
scene_01_woman_telephone_light_take_a 435 39
scene_01_woman_telephone_take_c 394 32
scene_01_woman_telephone_take_d 367 71
scene_02_knocking_light_take_c 39 33
scene_02_knocking_take_a 42 48
scene_02_knocking_take_b 6 30
scene_07_fab_standup_talkshimself_light_take_d 261 15
scene_07_fab_standup_talkshimself_light_take_e 225 45
scene_07_fab_standup_talkshimself_take_b 280 8
scene_07_fab_standup_talkshimself_take_c 291 3
scene_16_fab_oov_couch_light_take_c 165 15
scene_16_fab_oov_couch_take_e 194 16
scene_16_fab_oov_couch_take_f 176 16
scene_20_fab_hits_limping_speech_light_take_a 90 18
scene_20_fab_hits_limping_speech_light_take_b 87 21
scene_20_fab_hits_limping_speech_take_c 84 12
scene_20_woman_hits_limping_speech_light_take_f 49 11
scene_20_woman_hits_limping_speech_take_d 45 15
scene_20_woman_hits_limping_speech_take_e 67 11
Sum 3297 459