.redpajama-page * { box-sizing: border-box; }
.redpajama-page h1, .redpajama-page h2, .redpajama-page h3, .redpajama-page h4, .redpajama-page h5, .redpajama-page h6, .redpajama-page p, .redpajama-page ul, .redpajama-page ol, .redpajama-page li, .redpajama-page pre, .redpajama-page blockquote, .redpajama-page table, .redpajama-page td, .redpajama-page th { margin: 0; padding: 0; }
.redpajama-page {
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
color: var(--el-text-color-primary);
background: var(--el-bg-color);
line-height: 1.6;
}
.redpajama-page a { text-decoration: none; color: inherit; }
.redpajama-page a:hover { text-decoration: none; }
.redpajama-page ul { list-style: none; }
.markdown-body .redpajama-page a { color: inherit !important; text-decoration: none !important; }
.markdown-body .redpajama-page a:hover { text-decoration: none !important; }
.markdown-body .redpajama-page a.s-btn-primary,
.markdown-body .redpajama-page a.btn-cta-light { color: #ffffff !important; }
.markdown-body .redpajama-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
.markdown-body .redpajama-page a.btn-cta-ghost { color: #94a3b8 !important; }
.markdown-body .redpajama-page a.btn-cta-ghost:hover { color: #e2e8f0 !important; }
.markdown-body .redpajama-page h1, .markdown-body .redpajama-page h2 { border-bottom: none !important; padding-bottom: 0 !important; }
.redpajama-page .s-container { max-width: 1200px; margin: 0 auto; padding: 0 24px; }
.redpajama-page .s-container-narrow { max-width: 800px; margin: 0 auto; padding: 0 24px; }
.redpajama-page .s-container-wide { max-width: 1100px; margin: 0 auto; padding: 0 32px; }
.redpajama-page .s-section { padding: 80px 0; }
.redpajama-page .s-section-lg { padding: 100px 0; }
.redpajama-page .s-section-sm { padding: 48px 0; }
.redpajama-page .s-bg-white { background: var(--el-bg-color); }
.redpajama-page .s-bg-gray { background: var(--el-bg-color-page); }
.redpajama-page .s-bg-dark { background: #0f172a; color: #f8fafc; }
.redpajama-page .s-header { text-align: center; margin-bottom: 64px; }
.redpajama-page .s-header h2 {
font-size: clamp(28px, 4vw, 40px);
font-weight: 700;
color: var(--el-text-color-primary);
letter-spacing: normal;
margin-bottom: 20px;
line-height: 1.15;
}
.redpajama-page .s-header p {
font-size: clamp(16px, 2vw, 18px);
color: var(--el-text-color-regular);
max-width: 640px;
margin: 0 auto;
line-height: 1.6;
}
.redpajama-page .s-bg-dark .s-header h2 { color: #f8fafc; }
.redpajama-page .s-bg-dark .s-header p { color: var(--el-text-color-secondary); }
.redpajama-page .s-btn-primary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: #dc2626; color: #ffffff !important;
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: background 0.2s, transform 0.15s;
border: none; cursor: pointer;
text-decoration: none !important;
}
.redpajama-page .s-btn-primary:hover { background: #b91c1c; transform: translateY(-1px); text-decoration: none !important; }
.redpajama-page .s-btn-secondary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: var(--el-bg-color); color: var(--el-text-color-primary) !important;
border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: border-color 0.2s, background 0.2s;
cursor: pointer;
text-decoration: none !important;
}
.redpajama-page .s-btn-secondary:hover { background: var(--el-bg-color-page); text-decoration: none !important; }
.redpajama-hero {
padding: 100px 0 80px;
text-align: center;
background: var(--el-bg-color);
position: relative;
overflow: hidden;
}
.redpajama-hero::before {
content: '';
position: absolute;
top: -200px; left: 50%;
transform: translateX(-50%);
width: 900px; height: 500px;
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.06) 0%, transparent 70%);
pointer-events: none;
}
.redpajama-page .hero-badge {
display: inline-flex; align-items: center; gap: 8px;
padding: 6px 16px;
background: var(--el-bg-color-page); border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 13px; font-weight: 600; color: var(--el-text-color-regular);
margin-bottom: 28px;
}
.redpajama-page .hero-badge .badge-dot {
width: 6px; height: 6px; background: #10b981; border-radius: 50%;
display: inline-block;
}
.redpajama-hero h1 {
font-size: clamp(36px, 5vw, 60px);
font-weight: 700; line-height: 1.05;
letter-spacing: normal; color: var(--el-text-color-primary);
margin-bottom: 20px;
position: relative;
}
.redpajama-hero h1 span { color: #dc2626; }
.redpajama-page .hero-subtitle {
font-size: clamp(16px, 2vw, 20px);
color: var(--el-text-color-regular); line-height: 1.6;
max-width: 620px; margin: 0 auto 56px;
position: relative;
}
.redpajama-page .hero-actions {
display: flex; gap: 12px; justify-content: center;
flex-wrap: wrap; margin-bottom: 56px; position: relative;
}
.redpajama-page .hero-highlights {
display: flex; align-items: center; justify-content: center;
gap: 16px; flex-wrap: wrap; position: relative;
}
.redpajama-page .hero-highlights .h-item { font-size: 14px; color: var(--el-text-color-regular); font-weight: 500; }
.redpajama-page .hero-highlights .h-div { width: 1px; height: 16px; background: var(--el-border-color-light); }
@media (max-width: 640px)
{ .redpajama-page .hero-highlights .h-div { display: none; } .redpajama-page .hero-highlights { gap: 8px 16px; } .redpajama-page .hero-actions { flex-direction: column; align-items: center; } .redpajama-page .hero-actions a { width: 100%; max-width: 280px; justify-content: center; } } .redpajama-page .hero-cover { max-width: 720px; margin: 48px auto 0; border-radius: 16px; overflow: hidden; box-shadow: 0 8px 32px rgba(0,0,0,0.10); } .redpajama-page .hero-cover img { width: 100%; height: auto; display: block; } .redpajama-stats { padding: 48px 0; background: var(--el-bg-color-page); border-top: 1px solid var(--el-border-color-lighter); border-bottom: 1px solid var(--el-border-color-lighter); } .redpajama-page .stats-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 32px; text-align: center; } .redpajama-page .stat-icon { font-size: 28px; margin-bottom: 12px; } .redpajama-page .stat-val { font-size: clamp(28px, 4vw, 40px); font-weight: 700; color: var(--el-text-color-primary); letter-spacing: normal; margin-bottom: 4px; } .redpajama-page .stat-lbl { font-size: 14px; color: var(--el-text-color-secondary); font-weight: 500; } @media (max-width: 768px) { .redpajama-page .stats-grid { grid-template-columns: repeat(2, 1fr); gap: 24px; } } @media (max-width: 480px) { .redpajama-page .stats-grid { grid-template-columns: 1fr; gap: 20px; } } .redpajama-page .features-grid { display: grid; grid-template-columns: repeat(3, 1fr); gap: 24px; } .redpajama-page .feat-card { padding: 32px 28px; border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); background: var(--el-bg-color); transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .redpajama-page .feat-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .redpajama-page .feat-icon { font-size: 32px; margin-bottom: 16px; } .redpajama-page .feat-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .redpajama-page .feat-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .redpajama-page .features-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 640px) { .redpajama-page .features-grid { grid-template-columns: 1fr; } } .redpajama-page .usecases-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 20px; } .redpajama-page .uc-card { padding: 28px 24px; background: var(--el-bg-color); border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); text-align: center; transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .redpajama-page .uc-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .redpajama-page .uc-icon { font-size: 36px; margin-bottom: 16px; } .redpajama-page .uc-card h3 { font-size: 17px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .redpajama-page .uc-card p { font-size: 14px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .redpajama-page .usecases-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 480px) { .redpajama-page .usecases-grid { grid-template-columns: 1fr; } } .redpajama-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #fca5a5 !important; background: #fef2f2 !important; max-width: 860px; margin: 0 auto; } .markdown-body .redpajama-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #fca5a5 !important; background: #fef2f2 !important; } .redpajama-page .code-bar { display: flex !important; align-items: center !important; justify-content: space-between !important; padding: 12px 20px !important; background: #fee2e2 !important; border-bottom: 1px solid #fca5a5 !important; } .redpajama-page .code-dots { display: flex; gap: 6px; } .redpajama-page .code-dots i { width: 10px; height: 10px; border-radius: 50%; display: inline-block; } .redpajama-page .code-dots .r { background: #ef4444; } .redpajama-page .code-dots .y { background: #f59e0b; } .redpajama-page .code-dots .g { background: #10b981; } .redpajama-page .code-lang { font-size: 12px; color: #7f1d1d; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; } .redpajama-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #7f1d1d !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .markdown-body .redpajama-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #7f1d1d !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .redpajama-page .steps-row { display: flex; align-items: flex-start; justify-content: center; margin-bottom: 48px; } .redpajama-page .stp-card { flex: 1; max-width: 320px; text-align: center; padding: 0 24px; } .redpajama-page .stp-num { font-size: clamp(48px, 6vw, 72px); font-weight: 700; color: #e2e8f0; letter-spacing: -0.04em; line-height: 1; margin-bottom: 20px; } .redpajama-page .stp-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 10px; } .redpajama-page .stp-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } .redpajama-page .stp-conn { width: 60px; height: 2px; background: var(--el-border-color-light); margin-top: 36px; flex-shrink: 0; } .redpajama-page .steps-cta { text-align: center; } @media (max-width: 768px) { .redpajama-page .steps-row { flex-direction: column; align-items: center; gap: 32px; } .redpajama-page .stp-conn { width: 2px; height: 32px; margin: 0; } .redpajama-page .stp-card { max-width: 100%; } } .redpajama-cta { padding: 100px 0; background: #0f172a; text-align: center; position: relative; overflow: hidden; } .redpajama-cta::before { content: ''; position: absolute; top: -100px; left: 50%; transform: translateX(-50%); width: 700px; height: 400px; background: radial-gradient(ellipse, rgba(220, 38, 38, 0.12) 0%, transparent 70%); pointer-events: none; } .redpajama-cta h2 { font-size: clamp(28px, 4vw, 44px); font-weight: 700; color: #f8fafc; letter-spacing: normal; margin-bottom: 28px; position: relative; } .redpajama-cta > div > p { font-size: clamp(16px, 2vw, 18px); color: var(--el-text-color-secondary); max-width: 520px; margin: 0 auto 56px; line-height: 1.6; position: relative; } .redpajama-page .cta-actions { display: flex; gap: 12px; justify-content: center; flex-wrap: wrap; position: relative; } .redpajama-page .btn-cta-light { display: inline-flex; align-items: center; gap: 6px; padding: 14px 32px; background: #dc2626; color: #ffffff !important; border-radius: 9999px; font-size: 15px; font-weight: 700; transition: background 0.2s, transform 0.15s; text-decoration: none !important; } .redpajama-page .btn-cta-light:hover { background: #b91c1c; transform: translateY(-1px); text-decoration: none !important; } .redpajama-page .btn-cta-ghost { display: inline-flex; align-items: center; padding: 14px 32px; background: transparent; color: #94a3b8 !important; border: 1px solid #334155; border-radius: 9999px; font-size: 15px; font-weight: 600; transition: border-color 0.2s, color 0.2s; text-decoration: none !important; } .redpajama-page .btn-cta-ghost:hover { border-color: var(--el-text-color-regular); color: #e2e8f0 !important; text-decoration: none !important; } .redpajama-page code { background: #fef2f2 !important; padding: 2px 8px !important; border-radius: 5px !important; font-size: 13px !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; color: #dc2626 !important; border: 1px solid #fca5a5 !important; } .redpajama-page .s-text-dark { color: var(--el-text-color-primary); } .redpajama-page .s-text-brand { color: #dc2626; } .redpajama-page .s-section-body { font-size: 16px; color: var(--el-text-color-regular); line-height: 1.8; text-align: center; max-width: 680px; margin: 0 auto; } .redpajama-page .s-section-body p + p { margin-top: 16px; } .redpajama-page .tag-row { display: flex; gap: 8px; flex-wrap: wrap; justify-content: center; margin-top: 16px; } .redpajama-page .tag-item
{
padding: 4px 12px; background: var(--el-bg-color-page);
border: 1px solid var(--el-border-color-light); border-radius: 9999px;
font-size: 12px; font-weight: 600; color: var(--el-text-color-regular);
}
html.dark .redpajama-page { background: var(--el-bg-color); color: var(--el-text-color-primary); }
html.dark .redpajama-page a { color: inherit; }
html.dark .markdown-body .redpajama-page a { color: inherit !important; }
html.dark .markdown-body .redpajama-page a.s-btn-primary,
html.dark .markdown-body .redpajama-page a.btn-cta-light { color: #ffffff !important; }
html.dark .markdown-body .redpajama-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
html.dark .markdown-body .redpajama-page a.btn-cta-ghost { color: #94a3b8 !important; }
html.dark .markdown-body .redpajama-page a.btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
html.dark .redpajama-page .s-bg-white { background: var(--el-bg-color); }
html.dark .redpajama-page .s-bg-gray { background: var(--el-bg-color-page); }
html.dark .redpajama-page .s-bg-dark { background: var(--el-bg-color); }
html.dark .redpajama-page .s-header h2 { color: var(--el-text-color-primary); }
html.dark .redpajama-page .s-header p { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .s-btn-primary { background: #dc2626; color: #ffffff !important; }
html.dark .redpajama-page .s-btn-primary:hover { background: #b91c1c; }
html.dark .redpajama-page .s-btn-secondary {
background: #1e293b; color: var(--el-text-color-primary) !important;
border-color: #475569;
}
html.dark .redpajama-page .s-btn-secondary:hover { background: var(--el-border-color); border-color: var(--el-text-color-regular); }
html.dark .redpajama-hero { background: var(--el-bg-color); }
html.dark .redpajama-hero::before {
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.15) 0%, transparent 70%);
}
html.dark .redpajama-page .hero-badge { background: var(--el-bg-color-page); border-color: var(--el-border-color); color: var(--el-text-color-secondary); }
html.dark .redpajama-hero h1 { color: var(--el-text-color-primary); }
html.dark .redpajama-hero h1 span { color: #f87171; }
html.dark .redpajama-page .hero-subtitle { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .hero-highlights .h-item { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .hero-highlights .h-div { background: var(--el-border-color); }
html.dark .redpajama-stats { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .redpajama-page .stat-val { color: var(--el-text-color-primary); }
html.dark .redpajama-page .stat-lbl { color: var(--el-text-color-regular); }
html.dark .redpajama-page .feat-card {
background: var(--el-bg-color-page); border-color: var(--el-border-color);
}
html.dark .redpajama-page .feat-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .redpajama-page .feat-card h3 { color: var(--el-text-color-primary); }
html.dark .redpajama-page .feat-card p { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .uc-card { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .redpajama-page .uc-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .redpajama-page .uc-card h3 { color: var(--el-text-color-primary); }
html.dark .redpajama-page .uc-card p { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .stp-num { color: #334155; }
html.dark .redpajama-page .stp-card h3 { color: var(--el-text-color-primary); }
html.dark .redpajama-page .stp-card p { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .stp-conn { background: var(--el-border-color); }
html.dark .redpajama-page code {
background: #450a0a !important; color: #fecaca !important; border-color: #7f1d1d !important;
}
html.dark .redpajama-page .code-wrap {
border-color: #7f1d1d !important; background: #450a0a !important;
}
html.dark .redpajama-page .code-bar {
background: #7f1d1d !important; border-bottom-color: #991b1b !important;
}
html.dark .redpajama-page .code-block {
color: #fecaca !important;
}
html.dark .redpajama-page .code-lang { color: #fecaca; }
html.dark .redpajama-page .s-text-dark { color: var(--el-text-color-primary); }
html.dark .redpajama-page .s-text-brand { color: #f87171; }
html.dark .redpajama-page .s-section-body { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .tag-item { background: var(--el-border-color); border-color: var(--el-text-color-regular); color: var(--el-text-color-secondary); }
html.dark .redpajama-cta { background: #020617; }
html.dark .redpajama-cta::before {
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.2) 0%, transparent 70%);
}
html.dark .redpajama-page .btn-cta-light { color: #ffffff !important; }
html.dark .redpajama-page .btn-cta-ghost { color: #94a3b8 !important; }
html.dark .redpajama-page .btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
</style>
<div class="redpajama-page">
<section class="redpajama-hero">
<div class="s-container-narrow">
<div class="hero-badge">
<span class="badge-dot"></span>
RedPajama-Data-1T
</div>
<h1>
RedPajama<br/><span>Data-1T</span>
</h1>
<p class="hero-subtitle">
RedPajama-Data-1T is an open reproduction version of the LLaMA training dataset created by Together AI, containing 1.2 trillion tokens from seven data sources: CommonCrawl, C4, GitHub, Wikipedia, Books, ArXiv, and StackExchange, licensed under the Apache 2.0 license, supporting transparent and reproducible large language model training.
Dataset Highlights
An open and transparent trillion-scale pre-training dataset to support research and development of large language models
Trillion Token Scale
Contains 1.2 trillion Tokens, fully matching the original training data scale of LLaMA, providing ample data support for pre-training large models.
Seven Major Data Sources
Covers CommonCrawl, C4, GitHub, Wikipedia, Books, ArXiv, and StackExchange, spanning diverse fields such as web pages, code, encyclopedias, and academia.
Transparent Processing Workflow
Complete documentation of data processing and filtering pipelines, with every operation traceable and auditable, ensuring full transparency of data sources and quality.
Apache 2.0 License
Utilizes a permissive Apache 2.0 open-source license, supporting academic research and commercial applications without concerns about licensing restrictions.
Quality Filtering
Each data source is processed using domain-specific cleaning rules, including deduplication, language detection, quality scoring, and other multi-dimensional filtering.
Fully Reproducible
Complete methodology and processing workflows have been open-sourced, allowing researchers to reproduce, customize, and extend the dataset to meet their needs.
Applicable Scenarios
From model pre-training to data research, covering various large language model development scenarios
LLM Pre-training
Train large language models from scratch using validated data recipes to reproduce LLaMA-level training results
Data Ablation Experiments
Study the impact of different data sources on model performance, quantifying the contribution and importance of data from various fields
Curriculum Learning
Design multi-stage training curricula across data domains, optimizing data mixing ratios and training scheduling strategies
Model Comparison
Use standardized training data for fair comparisons of model architectures, eliminating interference from data differences
Quick Start
Quickly access the RedPajama dataset via API
import requestsurl = "https://api.acedata.cloud/datasets/redpajama" headers = { "Authorization": "Bearer YOUR_API_TOKEN", "Content-Type": "application/json" } params = { "source": "wikipedia", "limit": 10 }
response = requests.get(url, headers=headers, params=params) data = response.json()
Print the returned data entries
for item in data.get("data", []): print(item.get("text", "")[:200]) print("---")
3 Steps to Get Started Quickly
From registration to usage, you can start accessing trillion-scale pre-trained data in just a few minutes.
Register an Account
Register your Ace Data Cloud account at platform.acedata.cloud to quickly complete the developer onboarding.
Obtain API Key
Create your API key in the console for authentication and data access authorization.
Start Using the Dataset API
Access the RedPajama-Data-1T dataset via the API to query and download pre-trained data from seven major data sources as needed.
Start Exploring the RedPajama Dataset
Open license, trillion-scale, completely transparent. Whether you are training large language models or conducting data research, RedPajama-Data-1T is the ideal choice.
