MultiNLI Multi-Genre Natural Language
Inference Dataset
A natural language inference dataset with 430K sentence pairs covering 10 text genres, designed to test models' cross-domain language understanding and reasoning generalization.
Dataset Highlights
A large-scale multi-genre natural language inference benchmark for comprehensive cross-domain language understanding evaluation
Multi-Genre Coverage
Includes texts from 10 different genres such as fiction, government reports, letters, travel guides, and telephone conversations, thoroughly testing models' understanding of various registers and writing styles.
Large-Scale Annotation
Contains 433,000 sentence pairs annotated via crowdsourcing, each consisting of a premise and hypothesis labeled as entailment, contradiction, or neutral, with a dataset size far exceeding similar collections.
Cross-Domain Evaluation
Provides matched and mismatched development sets to separately test in-domain and out-of-domain generalization, enabling more scientific and comprehensive model evaluation.
Transfer Learning Friendly
Widely used for fine-tuning and evaluating pretrained language models, serving as a core benchmark dataset for models like BERT, RoBERTa, and GPT.
Academic Authority
Released by Williams et al. from New York University in 2018, the paper is widely cited, making it one of the most influential datasets in the natural language inference field.
Open Source
Built on the Open American National Corpus, with publicly transparent data sources and a permissive license suitable for academic research and commercial applications.
Use Cases
From model evaluation to application deployment, covering core NLU tasks
Textual Entailment
Determine the logical relationship between premise and hypothesis, training and evaluating the reasoning accuracy of natural language inference models.
Cross-Domain NLU
Use multi-genre data to test models' generalization and robustness across different text domains.
Transfer Learning
Serve as an upstream task for fine-tuning pretrained models, improving performance on downstream NLP tasks such as sentiment analysis and question answering.
Sentence Understanding
Analyze semantic relationships between sentences, applicable to semantic similarity calculation, paraphrase detection, and reading comprehension tasks.
Data Preview
Below are example sentence pairs from two different genres, including premise, hypothesis, label, and genre fields
// fiction genre example
{
"premise": "The old man the boats.",
"hypothesis": "The boats were manned by the elderly.",
"label": "entailment",
"genre": "fiction"
}
// government genre example
{
"premise": "The Commission has specific enforcement authority under the Act.",
"hypothesis": "The Commission lacks any enforcement power.",
"label": "contradiction",
"genre": "government"
}
// telephone genre example
{
"premise": "Yeah, I think that's a good idea.",
"hypothesis": "I do not think that is a wise plan.",
"label": "contradiction",
"genre": "telephone"
}
// travel genre example
{
"premise": "The temple was built in the 14th century.",
"hypothesis": "The structure has historical significance.",
"label": "entailment",
"genre": "travel"
}
// slate genre example
{
"premise": "The author argues that policy changes are needed.",
"hypothesis": "There is no mention of policy in the article.",
"label": "contradiction",
"genre": "slate"
}
3-Step Quick Start
From browsing to modeling, start natural language inference experiments in minutes
Browse the Dataset
View dataset details on the Ace Data Cloud platform, including the distribution of 10 genres, label explanations, and data scale metadata.
Download Data
Obtain matched and mismatched development sets as well as the full training set, ready to use in JSONL format without extra cleaning.
Load and Model
Quickly load with datasets.load_dataset("multi_nli") and start training an NLI classifier or fine-tuning pretrained models.
Start Exploring Multi-Genre Inference Data
A large-scale multi-genre dataset with an open license, available for immediate download. Whether you are an NLP researcher or engineer, MultiNLI is the ideal choice for evaluating cross-domain language understanding ability.
