QuAC Conversational Question Answering
Dataset
Contains 14,000 Wikipedia-based multi-turn conversational question answering dialogues, simulating information-seeking conversations between students and teachers. It is an important benchmark in the field of conversational reading comprehension.
Dataset Highlights
A benchmark for conversational reading comprehension, covering real multi-turn interactive scenarios
Student-Teacher Dialogue Mode
Simulates real information-seeking scenarios: students ask questions to explore unknown topics, teachers provide answer snippets based on Wikipedia paragraphs, resulting in natural and fluent conversations.
Multi-turn Context Dependency
Each dialogue contains an average of 7 QA turns, where subsequent questions depend on previous context, challenging models to track context and resolve coreferences.
Wikipedia Knowledge Base
All dialogues are based on over 8,600 Wikipedia paragraphs covering diverse domains such as people, history, and science, providing broad knowledge coverage.
Extractive Answer Spans
Each answer is annotated as an exact text span from the original passage, supporting standard reading comprehension evaluation methods.
Dialogue Behavior Annotations
Includes follow-up question flags and unanswerable question markers, providing rich metadata to support research on dialogue strategies.
Authoritative Academic Source
Released jointly by the University of Washington and Allen AI at EMNLP 2018, widely cited and adopted as a benchmark in academia and industry.
Applicable Scenarios
From academic research to industrial applications, covering various conversational understanding needs
Conversational Question Answering
Train and evaluate QA systems capable of understanding context and tracking topics across multiple dialogue turns
Dialogue Systems
Build intelligent conversational agents with information-seeking capabilities to enhance chatbot deep interaction experiences
Contextual Understanding
Research challenges in dialogue context modeling such as coreference resolution, ellipsis recovery, and topic shifts
Multi-turn Reasoning
Evaluate modelsβ abilities in cross-turn reasoning, information aggregation, and progressive understanding
Data Preview
Below is a typical example of a multi-turn conversational QA dialogue, showing the interaction between a student and a teacher based on a Wikipedia paragraph
{
"dialog_id": "C_6c5f277c0eef4b6e9e24b5e2b063673a_1",
"wikipedia_page_title": "Daffy Duck",
"background": "Daffy Duck is an animated cartoon character...",
"section_title": "Early years",
"context": "The earliest version of Daffy Duck appeared in the
cartoon Porky's Duck Hunt, released on April 17, 1937.
The cartoon was directed by Tex Avery and animated by
Bob Clampett. Daffy's name was given to him by Mel
Blanc, who provided his original voice...",
"turns": [
{
"turn_id": 0,
"question": "When did Daffy Duck first appear?",
"answer": "April 17, 1937",
"follow_up": "y"
},
{
"turn_id": 1,
"question": "Who directed that cartoon?",
"answer": "Tex Avery",
"follow_up": "y"
},
{
"turn_id": 2,
"question": "Who animated it?",
"answer": "Bob Clampett",
"follow_up": "y"
},
{
"turn_id": 3,
"question": "How did he get his name?",
"answer": "Daffy's name was given to him by Mel Blanc",
"follow_up": "n"
}
]
}
3-Step Quick Start
From browsing to analysis, start your conversational understanding research in minutes
Browse the Dataset
View dataset details on the Ace Data Cloud platform, including dialogue structure, field descriptions, and license information.
Download Data
Obtain the JSON file containing 14,000 multi-turn dialogues, with a clear data structure, ready to use without additional preprocessing.
Load and Analyze
Use json.load() to load the data, parse QA pairs by dialogue turns, and start building your conversational understanding model.
Start Exploring Conversational QA Data
An authoritative benchmark dataset with an open license, available for immediate download. Whether you are an NLP researcher or a dialogue system developer, QuAC is an indispensable research resource.
