XNLI Dataset

Cross-Lingual Natural Language Inference
Dataset

A benchmark dataset covering 15 languages for cross-lingual natural language inference, used to evaluate multilingual models’ cross-language understanding and zero-shot transfer capabilities.

112,500+ samples 15 languages CC BY-NC 4.0 License Conneau et al. (2018)
XNLI Dataset
🌐
112,500+
Total Sample Pairs
🗣️
15
Languages Covered
🏷️
3
Inference Label Categories
📜
CC BY-NC 4.0
Open License

Dataset Highlights

The authoritative benchmark in the cross-lingual natural language inference field, widely used for multilingual model evaluation

🌍

Coverage of 15 Languages

Includes English, French, Spanish, German, Greek, Bulgarian, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, Hindi, Swahili, and Urdu, covering multiple language families.

🔬

High-Quality Annotations

The test set is manually translated and annotated by professional translators, ensuring translation quality and label consistency, avoiding noise and bias introduced by machine translation.

🎯

Zero-Shot Transfer Evaluation

Designed specifically to evaluate cross-lingual zero-shot transfer abilities. Models trained on English can directly be tested across the other 14 languages, measuring real cross-lingual generalization.

📐

Standardized Format

Each sample includes three fields: premise, hypothesis, and label. Clear, uniform structure facilitates direct use for model training and evaluation.

📖

Academic and Authoritative Source

Published by Conneau et al. from Facebook AI Research in 2018, extended from the MultiNLI dataset, making it one of the most authoritative benchmarks in cross-lingual NLU.

🏛️

Widely Cited

Used as a core evaluation benchmark by milestones like mBERT, XLM, XLM-R, occupying an irreplaceable position in multilingual pretraining research.

Application Scenarios

From cross-lingual research to multilingual products, covering multiple key applications

🔄

Cross-Lingual Transfer

Assess the model’s inference ability when trained on English and transferred zero-shot to other languages, measuring cross-lingual generalization

🧠

Multilingual NLU

Test the natural language understanding capabilities of multilingual pretraining models (like mBERT, XLM-R), comparing different architectures’ performance

📊

Model Benchmarking

Serve as a standard evaluation benchmark for multilingual models, used for paper experiments, leaderboard rankings, and performance tracking

🏗️

Language Model Evaluation

Evaluate the reasoning and semantic understanding capabilities of large language models across languages, identifying performance gaps in low-resource languages

Cross-Lingual Understanding Multilingual NLP Natural Language Inference Zero-Shot Transfer Language Model Evaluation

Data Preview

Below are sample samples from the XNLI dataset, showcasing premise-hypothesis pairs and their inference labels

JSON
[
  {
    "premise": "And he said, Mama, I'm home.",
    "hypothesis": "He called out to his mother.",
    "label": "entailment",
    "language": "en"
  },
  {
    "premise": "And he said, Mama, I'm home.",
    "hypothesis": "He didn't say a word.",
    "label": "contradiction",
    "language": "en"
  },
  {
    "premise": "Et il a dit, Maman, je suis rentré.",
    "hypothesis": "Il a appelé sa mère.",
    "label": "entailment",
    "language": "fr"
  },
  {
    "premise": "他说,妈妈,我回来了。",
    "hypothesis": "他一句话也没说。",
    "label": "contradiction",
    "language": "zh"
  },
  {
    "premise": "And he said, Mama, I'm home.",
    "hypothesis": "He is outside the house.",
    "label": "neutral",
    "language": "en"
  }
]

3-Step Quick Start

From browsing to evaluation, get started with your cross-lingual NLU research in minutes

01

Browse Dataset

View dataset details on Ace Data Cloud platform, including sample distribution across 15 languages, label definitions, and license information.

02

Download Data

Download XNLI data files containing 7,500 validation samples and 5,000 test samples per language. Ready to use out of the box.

03

Load and Evaluate

Use datasets.load_dataset("xnli") to load the data and run inference evaluation experiments on multilingual models.

Start Exploring Cross-Lingual Inference Data

Authoritative benchmark dataset covering 15 languages. Whether you are a multilingual NLP researcher or a language model developer, XNLI is the top choice for evaluating cross-lingual capabilities.