CoQA Dataset

CoQA Conversational Question Answering
Dataset

Stanford CoQA contains 127K question-answer pairs covering 8,000 multi-turn conversations across 7 domains, supporting both extractive and free-form conversational question answering.

127K+ Q&A Pairs 8,000 Conversations 7 Domains Reddy et al. (2019)
CoQA Dataset
πŸ’¬
127K+
Total Q&A Pairs
πŸ“–
8,000
Multi-turn Conversations
🌐
7
Domains Covered
πŸŽ“
Stanford NLP
Research Source

Dataset Highlights

The first large-scale multi-domain conversational question answering dataset, advancing natural language understanding research

🌍

7 Diverse Domains

Covers children's stories, literature, middle school exams, news, Wikipedia, Reddit, and scientific articles, ensuring cross-domain generalization capability.

πŸ’¬

Real Multi-turn Conversations

Each conversation contains multiple naturally connected Q&A turns, with coreference resolution and contextual dependencies, closely reflecting real dialogue scenarios.

🎯

Extractive + Free-form Answers

Supports both extractive answers from the original text and free-form generated answers; each answer includes rationale spans from the source text to facilitate model training and evaluation.

πŸ“

Large-scale High Quality

Over 127,000 crowdsourced Q&A pairs with strict quality control; each conversation averages 15 Q&A turns, resulting in dense data.

πŸ“–

Significant Academic Impact

Published by Stanford NLP team led by Reddy et al. in 2019 in TACL, widely cited and a core benchmark in conversational QA research.

πŸ”§

Rich Annotation Information

Each sample includes story/paragraph, question sequences, free-form answers, rationale spans, and domain labels, providing comprehensive annotation dimensions.

Applicable Scenarios

From academic research to industrial applications, covering core conversational AI scenarios

πŸ€–

Conversational Question Answering

Train and evaluate QA models capable of multi-turn dialogue understanding, handling coreference resolution and contextual dependencies

🌐

Multi-domain Understanding

Test model transfer and generalization capabilities across domains such as children's stories, news, and science

✍️

Generative Answers

Train models to generate natural and fluent free-form answers, beyond simply extracting text spans from the source

πŸ’‘

Dialogue System Development

Provide high-quality training and evaluation data for dialogue systems such as intelligent customer service, educational tutoring, and reading assistants

Conversational QA Multi-domain Free-form Answers Stanford NLP Multi-turn Dialogue

Data Preview

The following is an example of a multi-turn conversation from the children's stories domain

JSON
{
  "source": "mctest",
  "domain": "children_stories",
  "story": "Once upon a time, in a barn near a farm house,
    there lived a little white kitten named Cotton.
    Cotton lived high up in a nice warm place above
    the barn where all of the hay was stored...",
  "questions": [
    {"turn_id": 1, "input_text": "What was the kitten's name?"},
    {"turn_id": 2, "input_text": "Where did it live?"},
    {"turn_id": 3, "input_text": "Was it alone?"},
    {"turn_id": 4, "input_text": "Who were its friends?"}
  ],
  "answers": [
    {
      "turn_id": 1,
      "input_text": "Cotton",
      "span_text": "a little white kitten named Cotton"
    },
    {
      "turn_id": 2,
      "input_text": "In a barn",
      "span_text": "in a barn near a farm house"
    },
    {
      "turn_id": 3,
      "input_text": "No",
      "span_text": "Cotton had two friends"
    },
    {
      "turn_id": 4,
      "input_text": "A hen and a dog",
      "span_text": "a chicken named Marge and a dog named Lulu"
    }
  ]
}

3 Steps to Get Started Quickly

From browsing to research, start your conversational QA project in minutes

01

Browse the Dataset

View dataset details on the Ace Data Cloud platform, including domain distribution, annotation format, and data statistics metadata.

02

Download Data

Obtain CoQA training and validation JSON files containing complete multi-turn conversations, answers, and rationale spans.

03

Load and Train

Use json.load() to parse the data, build conversational QA models, or fine-tune and evaluate existing models.

Start Exploring Conversational QA Data

The Stanford CoQA dataset: 127K Q&A pairs, 7 major domains, multi-turn conversations. Whether you are an NLP researcher or dialogue system developer, this dataset is an indispensable benchmark resource.