Lets Go Data
We distribute data totaling over 150,000 dialogs. Approximately 18,000 dialogs from the year 2009 were labeled with crowdsourcing[1]. A smaller amount of data has been hand-labeled.
You may download a CSV file of the transcriptions below. However, since the data set is quite large, we have found that the easiest method of distributing the full data set is mailing hard drives. To receive a full copy, including audio files, please contact James Valenti[2].
Acoustic models:
Below is a link to the pocketsphinx acoustic models currently used by Let's Go.
[1] Parent, G. and Eskenazi, M. Toward Better Crowdsourced Transcription: Transcription of a Year of the Let's Go Bus Information System Data. SLT 2010 [PDF]
[2] Transcriptions
[3] Tiancheng's email: tianchez [at] cs [dot] cmu [dot] edu