Overview
The dataset consists of 13,215 task-based dialogs, including 5,507 spoken and 7,708 written dialogs created with two distinct procedures. Each conversation falls into one of six domains: ordering pizza, creating auto repair appointments, setting up ride service, ordering movie tickets, ordering coffee drinks and making restaurant reservations.
Data Collection
Two-person, spoken dialogs were created using a Wizard of Oz methodology in which crowdsourced workers played the role of a 'user' and trained call center operators played the role of the 'assistant'. In this way, users were led to believe they were interacting with an automated system while it was in fact a human. As a result, users could express their turns in natural ways but in the context of an automated interface. For the written dialogs, we engaged crowdsourced workers to write the full conversation themselves based on scenarios outlined for each task, thereby playing roles of both the user and assistant. In a departure from traditional annotation techniques, dialogs are labeled with simple API arguments, i.e. the slot values required to execute the task transaction, instead of traditional semantic intents and dialog acts.
Citation
Please use the following citation when referencing the dataset:
@inproceedings{byrne-etal-2019-taskmaster,
title = {Taskmaster-1:Toward a Realistic and Diverse Dialog Dataset},
author = {Bill Byrne and Karthik Krishnamoorthi and Chinnadhurai Sankar and Arvind Neelakantan
and Daniel Duckworth and Semih Yavuz and Ben Goodrich and Amit Dubey and Kyu-Young Kim and
Andy Cedilnik},
booktitle = {2019 Conference on Empirical Methods in Natural Language
Processing and 9th International Joint Conference on Natural Language Processing},
address = {Hong Kong},
year = {2019}
}