Adult Income Prediction
Dataset
A classic census income dataset from the UCI Machine Learning Repository, containing 48,842 samples and 14 demographic features, predicting whether annual income exceeds $50K, is a standard dataset for social science data analysis. ```
Dataset Highlights
A large-scale social science dataset suitable for classification modeling from beginner to advanced levels
Real population data
The data comes from the 1994 US Census, containing real demographic information such as age, education, occupation, race, and gender.
Binary income classification
Predict whether an individual's annual income exceeds $50K, a classic dataset for learning about imbalanced classification problems.
Mixed feature types
Includes continuous (age, hours worked) and categorical (education, occupation, marital status) features, suitable for comprehensive practice in data preprocessing.
Large sample size
Nearly 50,000 samples support complex model training and cross-validation experiments.
Fairness research
The data includes sensitive features such as gender and race, making it an important dataset for studying algorithmic fairness and bias detection.
UCI authoritative source
Originating from the UCI Machine Learning Repository, it is one of the most cited datasets in the field of social science machine learning.
Applicable Scenarios
From classroom teaching to fairness research, the application scenarios are extensive
Income prediction
Build a binary classification model to predict individual annual income levels and compare the performance of different algorithms
Fairness analysis
Detect prediction differences in dimensions such as gender and race, and study algorithmic bias
Feature engineering
Handle mixed feature types and practice encoding, scaling, and feature selection techniques
Data visualization
Explore the relationship between demographic features and income, suitable for social science EDA
Data Preview
Below are the first few rows of the adult income dataset
age,workclass,fnlwgt,education,education_num,marital_status,occupation,relationship,race,sex,capital_gain,capital_loss,hours_per_week,native_country,income 39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K 50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K 38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3 Steps to Get Started Quickly
From browsing to analysis, you can start your data science project in just a few minutes
Browse the dataset
View the dataset details on the Ace Data Cloud platform, including field descriptions, sample size, and license agreement metadata.
Download the data
Download the CSV file (5.3 MB), which contains the combined data of the training and testing sets.
Load and analyze
Use pandas.read_csv() to load the data and start exploratory analysis and classification modeling.
Start exploring income prediction data
A classic social science dataset with an open license, available for immediate download. Nearly 50,000 real census records make it an ideal choice for classification modeling and fairness research.
