In the following subsections, separate results for every evaluated model and for each of the three recording locations is given. The positions of the True Positives (TP), False Positives (FP), False Negatives (FN) and True Negatives (TN) in table representing the confusion matrix is as follows: 


Summary of evaluation results

The different detectors evaluated for both audio and video modality show a good performance over the evaluation data set from all three recording locations. The results per location never fall below 71% for a detector per location set, and there are only subtle differences in detector performance between recordings of different locations. Over all sets and all detectors, a remarkable average of 75,3% of all frames have been correctly detected.


Within this deliverable, detector models developed by different project partners have been evaluated against the joint DIRAC audio‐visual data bases. The evaluation data base has been processed with the detectors, and ground truth has been annotated for evaluation. The detectors under evaluation performed well on recordings from all three recording locations on both the audio and video input data.