Out-of-language (OOL) detection

>>OOL data<<

These recordings were internally done using volunteers for the purpose of Out-of-language (OOL) detection by BUT http://speech.fit.vutbr.cz/.

Available data sets:

  • * 0/
    At the Odysee 2010 conference featuring multiple speakers and multiple languages. It was supposed to serve as a reference test for the LID, showing, that is somewhat works!
    Duration 1284s
  • * 1/
    Spanish Speaker + Interview Partner (English) mixed with some Spanish
    Duration 598s
  • * 2/
    Talk between German and Czech Speaker (English) mixed with some Czech
    Duration 381s
  • * 3/
    Talk between two German Speakers (English) mixed with some German
    Duration 775s
  • * 4/
    Israelian Speaker + Interview Partner (English) mixed with some Hebrew
    Duration 567s

Format of the reference segmentation:

  • s,e nr_label (*.seg files) e.g. 50.739567,10.1479 5_IL where
    • s,e are start and end times of the segment in seconds
    • nr is the segment number
  • label is one of the following reference labels:
    • *IL - in language, only English speech
    • *ool - up to 50% of non-English speech, e.g. a few words in a foreign language
    • *OOL - from 50% up to 100% non-English speech


All data sets are included in the following >>package<< for download!

>>OOL data<<