WSC273 Dataset

WSC273 Pronoun Resolution
Dataset

The Winograd Schema Challenge contains 273 carefully designed pronoun resolution problems that require common sense knowledge to determine referents, serving as a classic benchmark for evaluating AI language understanding capabilities.

273 Questions Binary Choices CC BY 4.0 License Levesque et al. (2012)
WSC273 Dataset
🧩
273
Pronoun Resolution Problems
🎯
2
Candidate Referents
🧠
Common Sense Reasoning
Core Capability Requirement
📜
CC BY 4.0
Open Licensing

Dataset Highlights

A classic AI benchmark with carefully designed pronoun resolution challenges

🏆

Alternative to Turing Test

Proposed by Hector Levesque as an alternative to the Turing Test, the problems are intricately designed, impossible to solve via statistical patterns or simple heuristics, truly testing language understanding.

🔄

Paired Sentence Design

Each pair of questions forms a pair, with only one keyword changed to flip the correct answer, ensuring models cannot cheat with shallow co-occurrence statistics and must truly understand semantics.

🧠

Driven by Common Sense Knowledge

Correct answers require physical intuition, social common sense, and causal reasoning, making it a gold standard for evaluating deep language understanding in AI systems.

📐

Strict Format Control

All questions follow a uniform sentence structure: a sentence with an ambiguous pronoun, two candidate referents, and one correct answer, facilitating standardized evaluation.

📖

Academic Classic

Originally from Terry Winograd’s 1972 example, systematically extended by Levesque et al. in 2012, widely cited and recognized in NLP academia.

⚖️

Fair Evaluation Benchmark

The random guess accuracy is exactly 50%, eliminating biases caused by dataset imbalance, so model scores truly reflect their common sense reasoning ability.

Applicable Scenarios

Widely applicable from language understanding research to AI system evaluation

🔗

Train and evaluate models to correctly identify entities referred to by pronouns within sentences, a core task in natural language understanding.

💡

Common Sense Reasoning

Test whether AI systems possess multi-dimensional common sense knowledge such as physics, social, and causal understanding, measuring deep semantic comprehension.

🗣️

Pronoun Disambiguation

Resolve ambiguous pronoun references in natural language, crucial for machine translation, dialogue systems, and information extraction.

🧪

AI Completeness Testing

As an alternative to Turing tests, used to evaluate whether AI systems reach human-level language understanding capabilities.

Pronoun Resolution Common Sense Reasoning Co-reference Resolution AI Completeness Language Understanding

Data Preview

Below are sample questions from the WSC273 dataset, demonstrating how keyword changes in paired sentences flip the correct answer

TEXT
# Example 1 (Paired Sentences)
Sentence: The trophy doesn't fit into the brown suitcase because it is too large.
Pronoun: it
Candidate A: trophy    Candidate B: suitcase
Correct Answer: A (trophy)

Sentence: The trophy doesn't fit into the brown suitcase because it is too small. Pronoun: it Candidate A: trophy Candidate B: suitcase Correct Answer: B (suitcase)

# Example 2 (Paired Sentences) Sentence: Joan made sure to thank Susan for all the help she had given. Pronoun: she Candidate A: Joan Candidate B: Susan Correct Answer: B (Susan)

Sentence: Joan made sure to thank Susan for all the help she had received. Pronoun: she Candidate A: Joan Candidate B: Susan Correct Answer: A (Joan)

# Example 3 Sentence: The city councilmen refused the demonstrators a permit because they feared violence. Pronoun: they Candidate A: councilmen Candidate B: demonstrators Correct Answer: A (councilmen)

3-Step Quick Start

From browsing to analysis, start your NLP research project in minutes

01

Browse the Dataset

View dataset details on the Ace Data Cloud platform, including question format, field descriptions, and license information.

02

Download Data

Obtain the complete dataset containing 273 pronoun resolution problems, each with sentences, pronouns, candidate referents, and correct answers.

03

Load and Evaluate

Use json.load() or pandas.read_json() to load data and evaluate your language models’ common sense reasoning.

Start Exploring the WSC273 Pronoun Resolution Data

A classic AI benchmark with open licensing, available for immediate download. Whether you're an NLP researcher or an AI system developer, this dataset is an essential tool for evaluating language understanding capabilities.