ReCoRD Cloze Reading Comprehension
Dataset
120K cloze-style reading comprehension questions based on CNN/Daily Mail news, requiring commonsense reasoning to select the correct entity. It is a core task of the SuperGLUE benchmark.
Dataset Highlights
Large-scale cloze reading comprehension benchmark challenging commonsense reasoning ability
Real News Corpus
All passages are from real news reports by CNN and Daily Mail, covering politics, sports, technology, entertainment, and more, with natural and authentic language.
Cloze-style Format
Each question replaces a key entity in the query sentence with @placeholder. The model must extract the correct entity from the passage to fill the blank, a concise yet challenging format.
Commonsense Reasoning Driven
Answering requires the model to go beyond surface text matching, integrating world knowledge and commonsense reasoning to understand passage semantics, truly testing deep language understanding.
Core SuperGLUE Task
As an important part of the SuperGLUE benchmark, ReCoRD is a standard test for evaluating pre-trained language models' reading comprehension and reasoning abilities.
Large-scale Annotation
Contains over 120K manually annotated question-answer pairs and 65K news passages, providing ample data for training and evaluating large language models.
Academic Authoritative Source
Released by Zhang et al. from Johns Hopkins University in 2018, widely cited in the NLP academic community, and a standard benchmark in reading comprehension research.
Use Cases
From model evaluation to academic research, covering diverse NLP task needs
Reading Comprehension
Train and evaluate models on accurately extracting entities from news passages to answer cloze questions
Commonsense Reasoning
Test whether models can leverage commonsense knowledge for reasoning beyond simple pattern matching and keyword retrieval
Cloze Tasks
Train models in standard cloze task format to enhance precise semantic understanding at the entity level
SuperGLUE Evaluation
Part of the SuperGLUE benchmark for systematic evaluation of pre-trained language models' comprehensive language understanding performance
Data Preview
The following is a sample ReCoRD data example, showing the structure of passage, query, and answers
{
"passage": {
"text": "CNN -- The U.S. Senate on Thursday passed a bill
that would provide #9.7 billion in flood insurance
to victims of Superstorm Sandy. The measure, which
passed 62-32, now goes to the House. President Barack
Obama has urged Congress to pass the bill quickly.",
"entities": [
{"start": 6, "end": 14, "text": "U.S. Senate"},
{"start": 112, "end": 126, "text": "Superstorm Sandy"},
{"start": 176, "end": 187, "text": "Barack Obama"},
{"start": 199, "end": 206, "text": "Congress"}
]
},
"query": "President @placeholder has urged lawmakers to
act swiftly on the flood insurance legislation.",
"answers": ["Barack Obama"],
"idx": 0
}
3 Steps to Get Started Quickly
From browsing to training, start your NLP research in minutes
Browse the Dataset
View dataset details on the Ace Data Cloud platform, including field descriptions, sample size, and SuperGLUE license metadata.
Download Data
Download the training, validation, and test JSON files of ReCoRD. The data structure is clear and fields are well-defined, ready to use out of the box.
Load and Train
Use json.load() or the HuggingFace datasets library to load the data and start fine-tuning, evaluation, and experiments.
Start Exploring the ReCoRD Reading Comprehension Data
A core SuperGLUE benchmark with 120K commonsense reasoning questions. Whether you are evaluating pre-trained models or exploring the frontiers of reading comprehension, this dataset is indispensable.
