<style>
.fineweb-page * { box-sizing: border-box; }
.fineweb-page h1, .fineweb-page h2, .fineweb-page h3, .fineweb-page h4, .fineweb-page h5, .fineweb-page h6, .fineweb-page p, .fineweb-page ul, .fineweb-page ol, .fineweb-page li, .fineweb-page pre, .fineweb-page blockquote, .fineweb-page table, .fineweb-page td, .fineweb-page th { margin: 0; padding: 0; }
.fineweb-page {
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
color: var(--el-text-color-primary);
background: var(--el-bg-color);
line-height: 1.6;
}
.fineweb-page a { text-decoration: none; color: inherit; }
.fineweb-page a:hover { text-decoration: none; }
.fineweb-page ul { list-style: none; }
.markdown-body .fineweb-page a { color: inherit !important; text-decoration: none !important; }
.markdown-body .fineweb-page a:hover { text-decoration: none !important; }
.markdown-body .fineweb-page a.s-btn-primary,
.markdown-body .fineweb-page a.btn-cta-light { color: #ffffff !important; }
.markdown-body .fineweb-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
.markdown-body .fineweb-page a.btn-cta-ghost { color: #94a3b8 !important; }
.markdown-body .fineweb-page a.btn-cta-ghost:hover { color: #e2e8f0 !important; }
.markdown-body .fineweb-page h1, .markdown-body .fineweb-page h2 { border-bottom: none !important; padding-bottom: 0 !important; }
.fineweb-page .s-container { max-width: 1200px; margin: 0 auto; padding: 0 24px; }
.fineweb-page .s-container-narrow { max-width: 800px; margin: 0 auto; padding: 0 24px; }
.fineweb-page .s-container-wide { max-width: 1100px; margin: 0 auto; padding: 0 32px; }
.fineweb-page .s-section { padding: 80px 0; }
.fineweb-page .s-section-lg { padding: 100px 0; }
.fineweb-page .s-section-sm { padding: 48px 0; }
.fineweb-page .s-bg-white { background: var(--el-bg-color); }
.fineweb-page .s-bg-gray { background: var(--el-bg-color-page); }
.fineweb-page .s-bg-dark { background: #0f172a; color: #f8fafc; }
.fineweb-page .s-header { text-align: center; margin-bottom: 64px; }
.fineweb-page .s-header h2 {
font-size: clamp(28px, 4vw, 40px);
font-weight: 700;
color: var(--el-text-color-primary);
letter-spacing: normal;
margin-bottom: 20px;
line-height: 1.15;
}
.fineweb-page .s-header p {
font-size: clamp(16px, 2vw, 18px);
color: var(--el-text-color-regular);
max-width: 640px;
margin: 0 auto;
line-height: 1.6;
}
.fineweb-page .s-bg-dark .s-header h2 { color: #f8fafc; }
.fineweb-page .s-bg-dark .s-header p { color: var(--el-text-color-secondary); }
.fineweb-page .s-btn-primary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: #8b5cf6; color: #ffffff !important;
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: background 0.2s, transform 0.15s;
border: none; cursor: pointer;
text-decoration: none !important;
}
.fineweb-page .s-btn-primary:hover { background: #7c3aed; transform: translateY(-1px); text-decoration: none !important; }
.fineweb-page .s-btn-secondary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: var(--el-bg-color); color: var(--el-text-color-primary) !important;
border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: border-color 0.2s, background 0.2s;
cursor: pointer;
text-decoration: none !important;
}
.fineweb-page .s-btn-secondary:hover { background: var(--el-bg-color-page); text-decoration: none !important; }
.fineweb-hero {
padding: 100px 0 80px;
text-align: center;
background: var(--el-bg-color);
position: relative;
overflow: hidden;
}
.fineweb-hero::before {
content: '';
position: absolute;
top: -200px; left: 50%;
transform: translateX(-50%);
width: 900px; height: 500px;
background: radial-gradient(ellipse, rgba(139, 92, 246, 0.06) 0%, transparent 70%);
pointer-events: none;
}
.fineweb-page .hero-badge {
display: inline-flex; align-items: center; gap: 8px;
padding: 6px 16px;
background: var(--el-bg-color-page); border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 13px; font-weight: 600; color: var(--el-text-color-regular);
margin-bottom: 28px;
}
.fineweb-page .hero-badge .badge-dot {
width: 6px; height: 6px; background: #10b981; border-radius: 50%;
display: inline-block;
}
.fineweb-hero h1 {
font-size: clamp(36px, 5vw, 60px);
font-weight: 700; line-height: 1.05;
letter-spacing: normal; color: var(--el-text-color-primary);
margin-bottom: 20px;
position: relative;
}
.fineweb-hero h1 span { color: #8b5cf6; }
.fineweb-page .hero-subtitle {
font-size: clamp(16px, 2vw, 20px);
color: var(--el-text-color-regular); line-height: 1.6;
max-width: 620px; margin: 0 auto 56px;
position: relative;
}
.fineweb-page .hero-actions {
display: flex; gap: 12px; justify-content: center;
flex-wrap: wrap; margin-bottom: 56px; position: relative;
}
.fineweb-page .hero-highlights {
display: flex; align-items: center; justify-content: center;
gap: 16px; flex-wrap: wrap; position: relative;
}
.fineweb-page .hero-highlights .h-item { font-size: 14px; color: var(--el-text-color-regular); font-weight: 500; }
.fineweb-page .hero-highlights .h-div { width: 1px; height: 16px; background: var(--el-border-color-light); }
@media (max-width: 640px)
{ .fineweb-page .hero-highlights .h-div { display: none; } .fineweb-page .hero-highlights { gap: 8px 16px; } .fineweb-page .hero-actions { flex-direction: column; align-items: center; } .fineweb-page .hero-actions a { width: 100%; max-width: 280px; justify-content: center; } } .fineweb-page .hero-cover { max-width: 720px; margin: 48px auto 0; border-radius: 16px; overflow: hidden; box-shadow: 0 8px 32px rgba(0,0,0,0.10); } .fineweb-page .hero-cover img { width: 100%; height: auto; display: block; } .fineweb-stats { padding: 48px 0; background: var(--el-bg-color-page); border-top: 1px solid var(--el-border-color-lighter); border-bottom: 1px solid var(--el-border-color-lighter); } .fineweb-page .stats-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 32px; text-align: center; } .fineweb-page .stat-icon { font-size: 28px; margin-bottom: 12px; } .fineweb-page .stat-val { font-size: clamp(28px, 4vw, 40px); font-weight: 700; color: var(--el-text-color-primary); letter-spacing: normal; margin-bottom: 4px; } .fineweb-page .stat-lbl { font-size: 14px; color: var(--el-text-color-secondary); font-weight: 500; } @media (max-width: 768px) { .fineweb-page .stats-grid { grid-template-columns: repeat(2, 1fr); gap: 24px; } } @media (max-width: 480px) { .fineweb-page .stats-grid { grid-template-columns: 1fr; gap: 20px; } } .fineweb-page .features-grid { display: grid; grid-template-columns: repeat(3, 1fr); gap: 24px; } .fineweb-page .feat-card { padding: 32px 28px; border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); background: var(--el-bg-color); transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .fineweb-page .feat-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .fineweb-page .feat-icon { font-size: 32px; margin-bottom: 16px; } .fineweb-page .feat-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .fineweb-page .feat-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .fineweb-page .features-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 640px) { .fineweb-page .features-grid { grid-template-columns: 1fr; } } .fineweb-page .usecases-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 20px; } .fineweb-page .uc-card { padding: 28px 24px; background: var(--el-bg-color); border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); text-align: center; transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .fineweb-page .uc-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .fineweb-page .uc-icon { font-size: 36px; margin-bottom: 16px; } .fineweb-page .uc-card h3 { font-size: 17px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .fineweb-page .uc-card p { font-size: 14px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .fineweb-page .usecases-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 480px) { .fineweb-page .usecases-grid { grid-template-columns: 1fr; } } .fineweb-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #334155 !important; background: #0f172a !important; max-width: 860px; margin: 0 auto; } .markdown-body .fineweb-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #334155 !important; background: #0f172a !important; } .fineweb-page .code-bar { display: flex !important; align-items: center !important; justify-content: space-between !important; padding: 12px 20px !important; background: #1e293b !important; border-bottom: 1px solid #334155 !important; } .fineweb-page .code-dots { display: flex; gap: 6px; } .fineweb-page .code-dots i { width: 10px; height: 10px; border-radius: 50%; display: inline-block; } .fineweb-page .code-dots .r { background: #ef4444; } .fineweb-page .code-dots .y { background: #f59e0b; } .fineweb-page .code-dots .g { background: #10b981; } .fineweb-page .code-lang { font-size: 12px; color: var(--el-text-color-secondary); font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; } .fineweb-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #e2e8f0 !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .markdown-body .fineweb-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #e2e8f0 !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .fineweb-page .steps-row { display: flex; align-items: flex-start; justify-content: center; margin-bottom: 48px; } .fineweb-page .stp-card { flex: 1; max-width: 320px; text-align: center; padding: 0 24px; } .fineweb-page .stp-num { font-size: clamp(48px, 6vw, 72px); font-weight: 700; color: #e2e8f0; letter-spacing: -0.04em; line-height: 1; margin-bottom: 20px; } .fineweb-page .stp-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 10px; } .fineweb-page .stp-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } .fineweb-page .stp-conn { width: 60px; height: 2px; background: var(--el-border-color-light); margin-top: 36px; flex-shrink: 0; } .fineweb-page .steps-cta { text-align: center; } @media (max-width: 768px) { .fineweb-page .steps-row { flex-direction: column; align-items: center; gap: 32px; } .fineweb-page .stp-conn { width: 2px; height: 32px; margin: 0; } .fineweb-page .stp-card { max-width: 100%; } } .fineweb-cta { padding: 100px 0; background: #0f172a; text-align: center; position: relative; overflow: hidden; } .fineweb-cta::before { content: ''; position: absolute; top: -100px; left: 50%; transform: translateX(-50%); width: 700px; height: 400px; background: radial-gradient(ellipse, rgba(139, 92, 246, 0.12) 0%, transparent 70%); pointer-events: none; } .fineweb-cta h2 { font-size: clamp(28px, 4vw, 44px); font-weight: 700; color: #f8fafc; letter-spacing: normal; margin-bottom: 28px; position: relative; } .fineweb-cta > div > p { font-size: clamp(16px, 2vw, 18px); color: var(--el-text-color-secondary); max-width: 520px; margin: 0 auto 56px; line-height: 1.6; position: relative; } .fineweb-page .cta-actions { display: flex; gap: 12px; justify-content: center; flex-wrap: wrap; position: relative; } .fineweb-page .btn-cta-light { display: inline-flex; align-items: center; gap: 6px; padding: 14px 32px; background: #8b5cf6; color: #ffffff !important; border-radius: 9999px; font-size: 15px; font-weight: 700; transition: background 0.2s, transform 0.15s; text-decoration: none !important; } .fineweb-page .btn-cta-light:hover { background: #7c3aed; transform: translateY(-1px); text-decoration: none !important; } .fineweb-page .btn-cta-ghost { display: inline-flex; align-items: center; padding: 14px 32px; background: transparent; color: #94a3b8 !important; border: 1px solid #334155; border-radius: 9999px; font-size: 15px; font-weight: 600; transition: border-color 0.2s, color 0.2s; text-decoration: none !important; } .fineweb-page .btn-cta-ghost:hover { border-color: var(--el-text-color-regular); color: #e2e8f0 !important; text-decoration: none !important; } .fineweb-page code { background: #ede9fe !important; padding: 2px 8px !important; border-radius: 5px !important; font-size: 13px !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; color: #7c3aed !important; border: 1px solid #c4b5fd !important; } .fineweb-page .s-text-dark { color: var(--el-text-color-primary); } .fineweb-page .s-text-brand { color: #8b5cf6; } .fineweb-page .s-section-body { font-size: 16px; color: var(--el-text-color-regular); line-height: 1.8; text-align: center; max-width: 680px; margin: 0 auto; } .fineweb-page .s-section-body p + p { margin-top: 16px; } .fineweb-page .tag-row { display: flex; gap: 8px; flex-wrap: wrap; justify-content: center; margin-top: 16px; } .fineweb-page .tag-item
{
padding: 4px 12px; background: var(--el-bg-color-page);
border: 1px solid var(--el-border-color-light); border-radius: 9999px;
font-size: 12px; font-weight: 600; color: var(--el-text-color-regular);
}
html.dark .fineweb-page { background: var(--el-bg-color); color: var(--el-text-color-primary); }
html.dark .fineweb-page a { color: inherit; }
html.dark .markdown-body .fineweb-page a { color: inherit !important; }
html.dark .markdown-body .fineweb-page a.s-btn-primary,
html.dark .markdown-body .fineweb-page a.btn-cta-light { color: #ffffff !important; }
html.dark .markdown-body .fineweb-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
html.dark .markdown-body .fineweb-page a.btn-cta-ghost { color: #94a3b8 !important; }
html.dark .markdown-body .fineweb-page a.btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
html.dark .fineweb-page .s-bg-white { background: var(--el-bg-color); }
html.dark .fineweb-page .s-bg-gray { background: var(--el-bg-color-page); }
html.dark .fineweb-page .s-bg-dark { background: var(--el-bg-color); }
html.dark .fineweb-page .s-header h2 { color: var(--el-text-color-primary); }
html.dark .fineweb-page .s-header p { color: var(--el-text-color-secondary); }
html.dark .fineweb-page .s-btn-primary { background: #8b5cf6; color: #ffffff !important; }
html.dark .fineweb-page .s-btn-primary:hover { background: #7c3aed; }
html.dark .fineweb-page .s-btn-secondary {
background: #1e293b; color: var(--el-text-color-primary) !important;
border-color: #475569;
}
html.dark .fineweb-page .s-btn-secondary:hover { background: var(--el-border-color); border-color: var(--el-text-color-regular); }
html.dark .fineweb-hero { background: var(--el-bg-color); }
html.dark .fineweb-hero::before {
background: radial-gradient(ellipse, rgba(139, 92, 246, 0.15) 0%, transparent 70%);
}
html.dark .fineweb-page .hero-badge { background: var(--el-bg-color-page); border-color: var(--el-border-color); color: var(--el-text-color-secondary); }
html.dark .fineweb-hero h1 { color: var(--el-text-color-primary); }
html.dark .fineweb-hero h1 span { color: #c4b5fd; }
html.dark .fineweb-page .hero-subtitle { color: var(--el-text-color-secondary); }
html.dark .fineweb-page .hero-highlights .h-item { color: var(--el-text-color-secondary); }
html.dark .fineweb-page .hero-highlights .h-div { background: var(--el-border-color); }
html.dark .fineweb-stats { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .fineweb-page .stat-val { color: var(--el-text-color-primary); }
html.dark .fineweb-page .stat-lbl { color: var(--el-text-color-regular); }
html.dark .fineweb-page .feat-card {
background: var(--el-bg-color-page); border-color: var(--el-border-color);
}
html.dark .fineweb-page .feat-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .fineweb-page .feat-card h3 { color: var(--el-text-color-primary); }
html.dark .fineweb-page .feat-card p { color: var(--el-text-color-secondary); }
html.dark .fineweb-page .uc-card { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .fineweb-page .uc-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .fineweb-page .uc-card h3 { color: var(--el-text-color-primary); }
html.dark .fineweb-page .uc-card p { color: var(--el-text-color-secondary); }
html.dark .fineweb-page .stp-num { color: #334155; }
html.dark .fineweb-page .stp-card h3 { color: var(--el-text-color-primary); }
html.dark .fineweb-page .stp-card p { color: var(--el-text-color-secondary); }
html.dark .fineweb-page .stp-conn { background: var(--el-border-color); }
html.dark .fineweb-page code {
background: #2e1065 !important; color: #ddd6fe !important; border-color: #8b5cf6 !important;
}
html.dark .fineweb-page .s-text-dark { color: var(--el-text-color-primary); }
html.dark .fineweb-page .s-text-brand { color: #c4b5fd; }
html.dark .fineweb-page .s-section-body { color: var(--el-text-color-secondary); }
html.dark .fineweb-page .tag-item { background: var(--el-border-color); border-color: var(--el-text-color-regular); color: var(--el-text-color-secondary); }
html.dark .fineweb-cta { background: #020617; }
html.dark .fineweb-cta::before {
background: radial-gradient(ellipse, rgba(139, 92, 246, 0.2) 0%, transparent 70%);
}
html.dark .fineweb-page .btn-cta-light { color: #ffffff !important; }
html.dark .fineweb-page .btn-cta-ghost { color: #94a3b8 !important; }
html.dark .fineweb-page .btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
</style>
<div class="fineweb-page">
<section class="fineweb-hero">
<div class="s-container-narrow">
<div class="hero-badge">
<span class="badge-dot"></span>
FineWeb Dataset
</div>
<h1>
FineWeb<br/><span>Ultra-large-scale pre-trained dataset</span>
</h1>
<p class="hero-subtitle">
FineWeb is an ultra-large-scale high-quality English web text dataset released by Hugging Face, containing 15 trillion tokens, selected from 96 Common Crawl snapshots, specifically designed for pre-training large language models. It employs a multi-stage data quality filtering pipeline (URL filtering, text extraction, deduplication, model quality scoring), surpassing similar datasets like C4, Dolma, and RedPajama.
Dataset Highlights
An industrial-grade high-quality corpus designed for pre-training large language models
Ultra-large scale
Contains 15 trillion tokens, making it the largest open English web text dataset, providing ample corpus for training GPT/LLaMA level large language models.
Quality-first pipeline
Utilizes a multi-stage data cleaning process: URL filtering, trafilatura text extraction, MinHash deduplication, model quality scoring, ensuring high quality for each data point.
Surpassing peer datasets
In LLM pre-training benchmark tests, FineWeb's performance significantly surpasses mainstream open-source datasets like C4, Dolma, and RedPajama.
CommonCrawl source
Extracted from 96 complete Common Crawl snapshots from 2013 to 2024, covering a wide range of web content and time spans.
Open license (ODC-By 1.0)
Utilizes the Open Data Commons Attribution license, allowing commercial use, modification, and redistribution with proper attribution.
Hugging Face ecosystem
Seamlessly integrates with the Hugging Face datasets library, supporting data streaming without needing to download all data to start training.
Applicable Scenarios
Comprehensively supports the LLM development process from model training to academic research
LLM pre-training
Train GPT, LLaMA level large language models from scratch, providing a vast amount of high-quality English corpus support
Continuous pre-training
Continuously pre-train existing foundational models for domain adaptation, enhancing language understanding capabilities in specific fields
Data quality research
Study the design and effects of data filtering pipelines, exploring the impact of data quality scoring on model performance
Benchmarking and evaluation
Compare the impact of different dataset qualities on model performance, providing experimental basis for dataset selection
Data Preview
The following is a JSONL format example of the FineWeb dataset, with each line containing text content, source URL, token count, and quality score
{"text": "Artificial intelligence continues to reshape industries across the globe...", "url": "https://example.com/ai-article", "token_count": 2847, "quality_score": 0.94}
{"text": "The latest advances in renewable energy technology promise...", "url": "https://example.com/energy-tech", "token_count": 1523, "quality_score": 0.91} {"text": "Understanding neural network architectures requires...", "url": "https://example.com/nn-guide", "token_count": 4102, "quality_score": 0.97}
3 Steps to Get Started Quickly
From browsing to training, quickly integrate FineWeb into your LLM development process
Browse Datasets
Browse the FineWeb datasets on the Ace Data Cloud platform, and view details such as metadata, licensing agreements, and data scale.
Stream Load or Download
Stream load data partitions via the Hugging Face datasets library, or download specific CC snapshot subsets on demand.
Integrate into Training Pipeline
Feed the data into your tokenizer and training pipeline, supporting mainstream deep learning frameworks such as PyTorch, JAX, and TensorFlow.
Start Exploring FineWeb Data
150 trillion tokens of high-quality English corpus, open license, available immediately. Whether you are training the next generation of large language models or researching data quality pipelines, FineWeb is your ideal choice.
