.redpajama-page * { box-sizing: border-box; }
.redpajama-page h1, .redpajama-page h2, .redpajama-page h3, .redpajama-page h4, .redpajama-page h5, .redpajama-page h6, .redpajama-page p, .redpajama-page ul, .redpajama-page ol, .redpajama-page li, .redpajama-page pre, .redpajama-page blockquote, .redpajama-page table, .redpajama-page td, .redpajama-page th { margin: 0; padding: 0; }
.redpajama-page {
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
color: var(--el-text-color-primary);
background: var(--el-bg-color);
line-height: 1.6;
}
.redpajama-page a { text-decoration: none; color: inherit; }
.redpajama-page a:hover { text-decoration: none; }
.redpajama-page ul { list-style: none; }
.markdown-body .redpajama-page a { color: inherit !important; text-decoration: none !important; }
.markdown-body .redpajama-page a:hover { text-decoration: none !important; }
.markdown-body .redpajama-page a.s-btn-primary,
.markdown-body .redpajama-page a.btn-cta-light { color: #ffffff !important; }
.markdown-body .redpajama-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
.markdown-body .redpajama-page a.btn-cta-ghost { color: #94a3b8 !important; }
.markdown-body .redpajama-page a.btn-cta-ghost:hover { color: #e2e8f0 !important; }
.markdown-body .redpajama-page h1, .markdown-body .redpajama-page h2 { border-bottom: none !important; padding-bottom: 0 !important; }
.redpajama-page .s-container { max-width: 1200px; margin: 0 auto; padding: 0 24px; }
.redpajama-page .s-container-narrow { max-width: 800px; margin: 0 auto; padding: 0 24px; }
.redpajama-page .s-container-wide { max-width: 1100px; margin: 0 auto; padding: 0 32px; }
.redpajama-page .s-section { padding: 80px 0; }
.redpajama-page .s-section-lg { padding: 100px 0; }
.redpajama-page .s-section-sm { padding: 48px 0; }
.redpajama-page .s-bg-white { background: var(--el-bg-color); }
.redpajama-page .s-bg-gray { background: var(--el-bg-color-page); }
.redpajama-page .s-bg-dark { background: #0f172a; color: #f8fafc; }
.redpajama-page .s-header { text-align: center; margin-bottom: 64px; }
.redpajama-page .s-header h2 {
font-size: clamp(28px, 4vw, 40px);
font-weight: 700;
color: var(--el-text-color-primary);
letter-spacing: normal;
margin-bottom: 20px;
line-height: 1.15;
}
.redpajama-page .s-header p {
font-size: clamp(16px, 2vw, 18px);
color: var(--el-text-color-regular);
max-width: 640px;
margin: 0 auto;
line-height: 1.6;
}
.redpajama-page .s-bg-dark .s-header h2 { color: #f8fafc; }
.redpajama-page .s-bg-dark .s-header p { color: var(--el-text-color-secondary); }
.redpajama-page .s-btn-primary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: #dc2626; color: #ffffff !important;
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: background 0.2s, transform 0.15s;
border: none; cursor: pointer;
text-decoration: none !important;
}
.redpajama-page .s-btn-primary:hover { background: #b91c1c; transform: translateY(-1px); text-decoration: none !important; }
.redpajama-page .s-btn-secondary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: var(--el-bg-color); color: var(--el-text-color-primary) !important;
border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: border-color 0.2s, background 0.2s;
cursor: pointer;
text-decoration: none !important;
}
.redpajama-page .s-btn-secondary:hover { background: var(--el-bg-color-page); text-decoration: none !important; }
.redpajama-hero {
padding: 100px 0 80px;
text-align: center;
background: var(--el-bg-color);
position: relative;
overflow: hidden;
}
.redpajama-hero::before {
content: '';
position: absolute;
top: -200px; left: 50%;
transform: translateX(-50%);
width: 900px; height: 500px;
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.06) 0%, transparent 70%);
pointer-events: none;
}
.redpajama-page .hero-badge {
display: inline-flex; align-items: center; gap: 8px;
padding: 6px 16px;
background: var(--el-bg-color-page); border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 13px; font-weight: 600; color: var(--el-text-color-regular);
margin-bottom: 28px;
}
.redpajama-page .hero-badge .badge-dot {
width: 6px; height: 6px; background: #10b981; border-radius: 50%;
display: inline-block;
}
.redpajama-hero h1 {
font-size: clamp(36px, 5vw, 60px);
font-weight: 700; line-height: 1.05;
letter-spacing: normal; color: var(--el-text-color-primary);
margin-bottom: 20px;
position: relative;
}
.redpajama-hero h1 span { color: #dc2626; }
.redpajama-page .hero-subtitle {
font-size: clamp(16px, 2vw, 20px);
color: var(--el-text-color-regular); line-height: 1.6;
max-width: 620px; margin: 0 auto 56px;
position: relative;
}
.redpajama-page .hero-actions {
display: flex; gap: 12px; justify-content: center;
flex-wrap: wrap; margin-bottom: 56px; position: relative;
}
.redpajama-page .hero-highlights {
display: flex; align-items: center; justify-content: center;
gap: 16px; flex-wrap: wrap; position: relative;
}
.redpajama-page .hero-highlights .h-item { font-size: 14px; color: var(--el-text-color-regular); font-weight: 500; }
.redpajama-page .hero-highlights .h-div { width: 1px; height: 16px; background: var(--el-border-color-light); }
@media (max-width: 640px) 

{ .redpajama-page .hero-highlights .h-div { display: none; } .redpajama-page .hero-highlights { gap: 8px 16px; } .redpajama-page .hero-actions { flex-direction: column; align-items: center; } .redpajama-page .hero-actions a { width: 100%; max-width: 280px; justify-content: center; } } .redpajama-page .hero-cover { max-width: 720px; margin: 48px auto 0; border-radius: 16px; overflow: hidden; box-shadow: 0 8px 32px rgba(0,0,0,0.10); } .redpajama-page .hero-cover img { width: 100%; height: auto; display: block; } .redpajama-stats { padding: 48px 0; background: var(--el-bg-color-page); border-top: 1px solid var(--el-border-color-lighter); border-bottom: 1px solid var(--el-border-color-lighter); } .redpajama-page .stats-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 32px; text-align: center; } .redpajama-page .stat-icon { font-size: 28px; margin-bottom: 12px; } .redpajama-page .stat-val { font-size: clamp(28px, 4vw, 40px); font-weight: 700; color: var(--el-text-color-primary); letter-spacing: normal; margin-bottom: 4px; } .redpajama-page .stat-lbl { font-size: 14px; color: var(--el-text-color-secondary); font-weight: 500; } @media (max-width: 768px) { .redpajama-page .stats-grid { grid-template-columns: repeat(2, 1fr); gap: 24px; } } @media (max-width: 480px) { .redpajama-page .stats-grid { grid-template-columns: 1fr; gap: 20px; } } .redpajama-page .features-grid { display: grid; grid-template-columns: repeat(3, 1fr); gap: 24px; } .redpajama-page .feat-card { padding: 32px 28px; border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); background: var(--el-bg-color); transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .redpajama-page .feat-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .redpajama-page .feat-icon { font-size: 32px; margin-bottom: 16px; } .redpajama-page .feat-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .redpajama-page .feat-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .redpajama-page .features-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 640px) { .redpajama-page .features-grid { grid-template-columns: 1fr; } } .redpajama-page .usecases-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 20px; } .redpajama-page .uc-card { padding: 28px 24px; background: var(--el-bg-color); border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); text-align: center; transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .redpajama-page .uc-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .redpajama-page .uc-icon { font-size: 36px; margin-bottom: 16px; } .redpajama-page .uc-card h3 { font-size: 17px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .redpajama-page .uc-card p { font-size: 14px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .redpajama-page .usecases-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 480px) { .redpajama-page .usecases-grid { grid-template-columns: 1fr; } } .redpajama-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #fca5a5 !important; background: #fef2f2 !important; max-width: 860px; margin: 0 auto; } .markdown-body .redpajama-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #fca5a5 !important; background: #fef2f2 !important; } .redpajama-page .code-bar { display: flex !important; align-items: center !important; justify-content: space-between !important; padding: 12px 20px !important; background: #fee2e2 !important; border-bottom: 1px solid #fca5a5 !important; } .redpajama-page .code-dots { display: flex; gap: 6px; } .redpajama-page .code-dots i { width: 10px; height: 10px; border-radius: 50%; display: inline-block; } .redpajama-page .code-dots .r { background: #ef4444; } .redpajama-page .code-dots .y { background: #f59e0b; } .redpajama-page .code-dots .g { background: #10b981; } .redpajama-page .code-lang { font-size: 12px; color: #7f1d1d; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; } .redpajama-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #7f1d1d !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .markdown-body .redpajama-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #7f1d1d !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .redpajama-page .steps-row { display: flex; align-items: flex-start; justify-content: center; margin-bottom: 48px; } .redpajama-page .stp-card { flex: 1; max-width: 320px; text-align: center; padding: 0 24px; } .redpajama-page .stp-num { font-size: clamp(48px, 6vw, 72px); font-weight: 700; color: #e2e8f0; letter-spacing: -0.04em; line-height: 1; margin-bottom: 20px; } .redpajama-page .stp-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 10px; } .redpajama-page .stp-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } .redpajama-page .stp-conn { width: 60px; height: 2px; background: var(--el-border-color-light); margin-top: 36px; flex-shrink: 0; } .redpajama-page .steps-cta { text-align: center; } @media (max-width: 768px) { .redpajama-page .steps-row { flex-direction: column; align-items: center; gap: 32px; } .redpajama-page .stp-conn { width: 2px; height: 32px; margin: 0; } .redpajama-page .stp-card { max-width: 100%; } } .redpajama-cta { padding: 100px 0; background: #0f172a; text-align: center; position: relative; overflow: hidden; } .redpajama-cta::before { content: ''; position: absolute; top: -100px; left: 50%; transform: translateX(-50%); width: 700px; height: 400px; background: radial-gradient(ellipse, rgba(220, 38, 38, 0.12) 0%, transparent 70%); pointer-events: none; } .redpajama-cta h2 { font-size: clamp(28px, 4vw, 44px); font-weight: 700; color: #f8fafc; letter-spacing: normal; margin-bottom: 28px; position: relative; } .redpajama-cta > div > p { font-size: clamp(16px, 2vw, 18px); color: var(--el-text-color-secondary); max-width: 520px; margin: 0 auto 56px; line-height: 1.6; position: relative; } .redpajama-page .cta-actions { display: flex; gap: 12px; justify-content: center; flex-wrap: wrap; position: relative; } .redpajama-page .btn-cta-light { display: inline-flex; align-items: center; gap: 6px; padding: 14px 32px; background: #dc2626; color: #ffffff !important; border-radius: 9999px; font-size: 15px; font-weight: 700; transition: background 0.2s, transform 0.15s; text-decoration: none !important; } .redpajama-page .btn-cta-light:hover { background: #b91c1c; transform: translateY(-1px); text-decoration: none !important; } .redpajama-page .btn-cta-ghost { display: inline-flex; align-items: center; padding: 14px 32px; background: transparent; color: #94a3b8 !important; border: 1px solid #334155; border-radius: 9999px; font-size: 15px; font-weight: 600; transition: border-color 0.2s, color 0.2s; text-decoration: none !important; } .redpajama-page .btn-cta-ghost:hover { border-color: var(--el-text-color-regular); color: #e2e8f0 !important; text-decoration: none !important; } .redpajama-page code { background: #fef2f2 !important; padding: 2px 8px !important; border-radius: 5px !important; font-size: 13px !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; color: #dc2626 !important; border: 1px solid #fca5a5 !important; } .redpajama-page .s-text-dark { color: var(--el-text-color-primary); } .redpajama-page .s-text-brand { color: #dc2626; } .redpajama-page .s-section-body { font-size: 16px; color: var(--el-text-color-regular); line-height: 1.8; text-align: center; max-width: 680px; margin: 0 auto; } .redpajama-page .s-section-body p + p { margin-top: 16px; } .redpajama-page .tag-row { display: flex; gap: 8px; flex-wrap: wrap; justify-content: center; margin-top: 16px; } .redpajama-page .tag-item

{
padding: 4px 12px; background: var(--el-bg-color-page);
border: 1px solid var(--el-border-color-light); border-radius: 9999px;
font-size: 12px; font-weight: 600; color: var(--el-text-color-regular);
}
html.dark .redpajama-page { background: var(--el-bg-color); color: var(--el-text-color-primary); }
html.dark .redpajama-page a { color: inherit; }
html.dark .markdown-body .redpajama-page a { color: inherit !important; }
html.dark .markdown-body .redpajama-page a.s-btn-primary,
html.dark .markdown-body .redpajama-page a.btn-cta-light { color: #ffffff !important; }
html.dark .markdown-body .redpajama-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
html.dark .markdown-body .redpajama-page a.btn-cta-ghost { color: #94a3b8 !important; }
html.dark .markdown-body .redpajama-page a.btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
html.dark .redpajama-page .s-bg-white { background: var(--el-bg-color); }
html.dark .redpajama-page .s-bg-gray { background: var(--el-bg-color-page); }
html.dark .redpajama-page .s-bg-dark { background: var(--el-bg-color); }
html.dark .redpajama-page .s-header h2 { color: var(--el-text-color-primary); }
html.dark .redpajama-page .s-header p { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .s-btn-primary { background: #dc2626; color: #ffffff !important; }
html.dark .redpajama-page .s-btn-primary:hover { background: #b91c1c; }
html.dark .redpajama-page .s-btn-secondary {
background: #1e293b; color: var(--el-text-color-primary) !important;
border-color: #475569;
}
html.dark .redpajama-page .s-btn-secondary:hover { background: var(--el-border-color); border-color: var(--el-text-color-regular); }
html.dark .redpajama-hero { background: var(--el-bg-color); }
html.dark .redpajama-hero::before {
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.15) 0%, transparent 70%);
}
html.dark .redpajama-page .hero-badge { background: var(--el-bg-color-page); border-color: var(--el-border-color); color: var(--el-text-color-secondary); }
html.dark .redpajama-hero h1 { color: var(--el-text-color-primary); }
html.dark .redpajama-hero h1 span { color: #f87171; }
html.dark .redpajama-page .hero-subtitle { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .hero-highlights .h-item { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .hero-highlights .h-div { background: var(--el-border-color); }
html.dark .redpajama-stats { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .redpajama-page .stat-val { color: var(--el-text-color-primary); }
html.dark .redpajama-page .stat-lbl { color: var(--el-text-color-regular); }
html.dark .redpajama-page .feat-card {
background: var(--el-bg-color-page); border-color: var(--el-border-color);
}
html.dark .redpajama-page .feat-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .redpajama-page .feat-card h3 { color: var(--el-text-color-primary); }
html.dark .redpajama-page .feat-card p { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .uc-card { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .redpajama-page .uc-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .redpajama-page .uc-card h3 { color: var(--el-text-color-primary); }
html.dark .redpajama-page .uc-card p { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .stp-num { color: #334155; }
html.dark .redpajama-page .stp-card h3 { color: var(--el-text-color-primary); }
html.dark .redpajama-page .stp-card p { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .stp-conn { background: var(--el-border-color); }
html.dark .redpajama-page code {
background: #450a0a !important; color: #fecaca !important; border-color: #7f1d1d !important;
}
html.dark .redpajama-page .code-wrap {
border-color: #7f1d1d !important; background: #450a0a !important;
}
html.dark .redpajama-page .code-bar {
background: #7f1d1d !important; border-bottom-color: #991b1b !important;
}
html.dark .redpajama-page .code-block {
color: #fecaca !important;
}
html.dark .redpajama-page .code-lang { color: #fecaca; }
html.dark .redpajama-page .s-text-dark { color: var(--el-text-color-primary); }
html.dark .redpajama-page .s-text-brand { color: #f87171; }
html.dark .redpajama-page .s-section-body { color: var(--el-text-color-secondary); }
html.dark .redpajama-page .tag-item { background: var(--el-border-color); border-color: var(--el-text-color-regular); color: var(--el-text-color-secondary); }
html.dark .redpajama-cta { background: #020617; }
html.dark .redpajama-cta::before {
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.2) 0%, transparent 70%);
}
html.dark .redpajama-page .btn-cta-light { color: #ffffff !important; }
html.dark .redpajama-page .btn-cta-ghost { color: #94a3b8 !important; }
html.dark .redpajama-page .btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
</style>
<div class="redpajama-page">
<section class="redpajama-hero">
<div class="s-container-narrow">
<div class="hero-badge">
<span class="badge-dot"></span>
RedPajama-Data-1T
</div>
<h1>
RedPajama<br/><span>Data-1T</span>
</h1>
<p class="hero-subtitle">
RedPajama-Data-1T is an open reproduction version of the LLaMA training dataset created by Together AI, containing 1.2 trillion tokens from seven data sources: CommonCrawl, C4, GitHub, Wikipedia, Books, ArXiv, and StackExchange, licensed under the Apache 2.0 license, supporting transparent and reproducible large language model training.

1.2 trillion Tokens 7 major data sources Apache 2.0 license Together AI
πŸ“Š
1.2T
Total Tokens
πŸ—‚οΈ
7
Data Sources
πŸ“œ
Apache 2.0
Open License Agreement
πŸ§ͺ
LLaMA
Training Recipe Reproduction

Dataset Highlights

An open and transparent trillion-scale pre-training dataset to support research and development of large language models

🌐

Trillion Token Scale

Contains 1.2 trillion Tokens, fully matching the original training data scale of LLaMA, providing ample data support for pre-training large models.

πŸ“š

Seven Major Data Sources

Covers CommonCrawl, C4, GitHub, Wikipedia, Books, ArXiv, and StackExchange, spanning diverse fields such as web pages, code, encyclopedias, and academia.

πŸ”

Transparent Processing Workflow

Complete documentation of data processing and filtering pipelines, with every operation traceable and auditable, ensuring full transparency of data sources and quality.

βš–οΈ

Apache 2.0 License

Utilizes a permissive Apache 2.0 open-source license, supporting academic research and commercial applications without concerns about licensing restrictions.

🧹

Quality Filtering

Each data source is processed using domain-specific cleaning rules, including deduplication, language detection, quality scoring, and other multi-dimensional filtering.

πŸ”„

Fully Reproducible

Complete methodology and processing workflows have been open-sourced, allowing researchers to reproduce, customize, and extend the dataset to meet their needs.

Applicable Scenarios

From model pre-training to data research, covering various large language model development scenarios

🧠

LLM Pre-training

Train large language models from scratch using validated data recipes to reproduce LLaMA-level training results

πŸ“ˆ

Data Ablation Experiments

Study the impact of different data sources on model performance, quantifying the contribution and importance of data from various fields

πŸ“‹

Curriculum Learning

Design multi-stage training curricula across data domains, optimizing data mixing ratios and training scheduling strategies

πŸ”¬

Model Comparison

Use standardized training data for fair comparisons of model architectures, eliminating interference from data differences

NLP pre-training LLaMA open-source trillion-tokens

Quick Start

Quickly access the RedPajama dataset via API

Python
import requests

url = "https://api.acedata.cloud/datasets/redpajama" headers = { "Authorization": "Bearer YOUR_API_TOKEN", "Content-Type": "application/json" } params = { "source": "wikipedia", "limit": 10 }

response = requests.get(url, headers=headers, params=params) data = response.json()

Print the returned data entries

for item in data.get("data", []): print(item.get("text", "")[:200]) print("---")

3 Steps to Get Started Quickly

From registration to usage, you can start accessing trillion-scale pre-trained data in just a few minutes.

01

Register an Account

Register your Ace Data Cloud account at platform.acedata.cloud to quickly complete the developer onboarding.

02

Obtain API Key

Create your API key in the console for authentication and data access authorization.

03

Start Using the Dataset API

Access the RedPajama-Data-1T dataset via the API to query and download pre-trained data from seven major data sources as needed.

Start Exploring the RedPajama Dataset

Open license, trillion-scale, completely transparent. Whether you are training large language models or conducting data research, RedPajama-Data-1T is the ideal choice.