<style>
.favdbench-page * { box-sizing: border-box; }
.favdbench-page h1, .favdbench-page h2, .favdbench-page h3, .favdbench-page h4, .favdbench-page h5, .favdbench-page h6, .favdbench-page p, .favdbench-page ul, .favdbench-page ol, .favdbench-page li, .favdbench-page pre, .favdbench-page blockquote, .favdbench-page table, .favdbench-page td, .favdbench-page th { margin: 0; padding: 0; }
.favdbench-page {
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
color: var(--el-text-color-primary);
background: var(--el-bg-color);
line-height: 1.6;
}
.favdbench-page a { text-decoration: none; color: inherit; }
.favdbench-page a:hover { text-decoration: none; }
.favdbench-page ul { list-style: none; }
.markdown-body .favdbench-page a { color: inherit !important; text-decoration: none !important; }
.markdown-body .favdbench-page a:hover { text-decoration: none !important; }
.markdown-body .favdbench-page a.s-btn-primary,
.markdown-body .favdbench-page a.btn-cta-light { color: #ffffff !important; }
.markdown-body .favdbench-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
.markdown-body .favdbench-page a.btn-cta-ghost { color: #94a3b8 !important; }
.markdown-body .favdbench-page a.btn-cta-ghost:hover { color: #e2e8f0 !important; }
.markdown-body .favdbench-page h1, .markdown-body .favdbench-page h2 { border-bottom: none !important; padding-bottom: 0 !important; }
.favdbench-page .s-container { max-width: 1200px; margin: 0 auto; padding: 0 24px; }
.favdbench-page .s-container-narrow { max-width: 800px; margin: 0 auto; padding: 0 24px; }
.favdbench-page .s-container-wide { max-width: 1100px; margin: 0 auto; padding: 0 32px; }
.favdbench-page .s-section { padding: 80px 0; }
.favdbench-page .s-section-lg { padding: 100px 0; }
.favdbench-page .s-section-sm { padding: 48px 0; }
.favdbench-page .s-bg-white { background: var(--el-bg-color); }
.favdbench-page .s-bg-gray { background: var(--el-bg-color-page); }
.favdbench-page .s-bg-dark { background: #0f172a; color: #f8fafc; }
.favdbench-page .s-header { text-align: center; margin-bottom: 64px; }
.favdbench-page .s-header h2 {
font-size: clamp(28px, 4vw, 40px);
font-weight: 700;
color: var(--el-text-color-primary);
letter-spacing: normal;
margin-bottom: 20px;
line-height: 1.15;
}
.favdbench-page .s-header p {
font-size: clamp(16px, 2vw, 18px);
color: var(--el-text-color-regular);
max-width: 640px;
margin: 0 auto;
line-height: 1.6;
}
.favdbench-page .s-bg-dark .s-header h2 { color: #f8fafc; }
.favdbench-page .s-bg-dark .s-header p { color: var(--el-text-color-secondary); }
.favdbench-page .s-btn-primary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: #dc2626; color: #ffffff !important;
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: background 0.2s, transform 0.15s;
border: none; cursor: pointer;
text-decoration: none !important;
}
.favdbench-page .s-btn-primary:hover { background: #b91c1c; transform: translateY(-1px); text-decoration: none !important; }
.favdbench-page .s-btn-secondary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: var(--el-bg-color); color: var(--el-text-color-primary) !important;
border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: border-color 0.2s, background 0.2s;
cursor: pointer;
text-decoration: none !important;
}
.favdbench-page .s-btn-secondary:hover { background: var(--el-bg-color-page); text-decoration: none !important; }
.favdbench-hero {
padding: 100px 0 80px;
text-align: center;
background: var(--el-bg-color);
position: relative;
overflow: hidden;
}
.favdbench-hero::before {
content: '';
position: absolute;
top: -200px; left: 50%;
transform: translateX(-50%);
width: 900px; height: 500px;
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.06) 0%, transparent 70%);
pointer-events: none;
}
.favdbench-page .hero-badge {
display: inline-flex; align-items: center; gap: 8px;
padding: 6px 16px;
background: var(--el-bg-color-page); border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 13px; font-weight: 600; color: var(--el-text-color-regular);
margin-bottom: 28px;
}
.favdbench-page .hero-badge .badge-dot {
width: 6px; height: 6px; background: #10b981; border-radius: 50%;
display: inline-block;
}
.favdbench-hero h1 {
font-size: clamp(36px, 5vw, 60px);
font-weight: 700; line-height: 1.05;
letter-spacing: normal; color: var(--el-text-color-primary);
margin-bottom: 20px;
position: relative;
}
.favdbench-hero h1 span { color: #dc2626; }
.favdbench-page .hero-subtitle {
font-size: clamp(16px, 2vw, 20px);
color: var(--el-text-color-regular); line-height: 1.6;
max-width: 620px; margin: 0 auto 56px;
position: relative;
}
.favdbench-page .hero-actions {
display: flex; gap: 12px; justify-content: center;
flex-wrap: wrap; margin-bottom: 56px; position: relative;
}
.favdbench-page .hero-highlights {
display: flex; align-items: center; justify-content: center;
gap: 16px; flex-wrap: wrap; position: relative;
}
.favdbench-page .hero-highlights .h-item { font-size: 14px; color: var(--el-text-color-regular); font-weight: 500; }
.favdbench-page .hero-highlights .h-div { width: 1px; height: 16px; background: var(--el-border-color-light); }
@media (max-width: 640px)
{ .favdbench-page .hero-highlights .h-div { display: none; } .favdbench-page .hero-highlights { gap: 8px 16px; } .favdbench-page .hero-actions { flex-direction: column; align-items: center; } .favdbench-page .hero-actions a { width: 100%; max-width: 280px; justify-content: center; } } .favdbench-page .hero-cover { max-width: 720px; margin: 48px auto 0; border-radius: 16px; overflow: hidden; box-shadow: 0 8px 32px rgba(0,0,0,0.10); } .favdbench-page .hero-cover img { width: 100%; height: auto; display: block; } .favdbench-stats { padding: 48px 0; background: var(--el-bg-color-page); border-top: 1px solid var(--el-border-color-lighter); border-bottom: 1px solid var(--el-border-color-lighter); } .favdbench-page .stats-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 32px; text-align: center; } .favdbench-page .stat-icon { font-size: 28px; margin-bottom: 12px; } .favdbench-page .stat-val { font-size: clamp(28px, 4vw, 40px); font-weight: 700; color: var(--el-text-color-primary); letter-spacing: normal; margin-bottom: 4px; } .favdbench-page .stat-lbl { font-size: 14px; color: var(--el-text-color-secondary); font-weight: 500; } @media (max-width: 768px) { .favdbench-page .stats-grid { grid-template-columns: repeat(2, 1fr); gap: 24px; } } @media (max-width: 480px) { .favdbench-page .stats-grid { grid-template-columns: 1fr; gap: 20px; } } .favdbench-page .features-grid { display: grid; grid-template-columns: repeat(3, 1fr); gap: 24px; } .favdbench-page .feat-card { padding: 32px 28px; border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); background: var(--el-bg-color); transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .favdbench-page .feat-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .favdbench-page .feat-icon { font-size: 32px; margin-bottom: 16px; } .favdbench-page .feat-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .favdbench-page .feat-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .favdbench-page .features-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 640px) { .favdbench-page .features-grid { grid-template-columns: 1fr; } } .favdbench-page .usecases-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 20px; } .favdbench-page .uc-card { padding: 28px 24px; background: var(--el-bg-color); border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); text-align: center; transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .favdbench-page .uc-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .favdbench-page .uc-icon { font-size: 36px; margin-bottom: 16px; } .favdbench-page .uc-card h3 { font-size: 17px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .favdbench-page .uc-card p { font-size: 14px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .favdbench-page .usecases-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 480px) { .favdbench-page .usecases-grid { grid-template-columns: 1fr; } } .favdbench-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #334155 !important; background: #0f172a !important; max-width: 860px; margin: 0 auto; } .markdown-body .favdbench-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #334155 !important; background: #0f172a !important; } .favdbench-page .code-bar { display: flex !important; align-items: center !important; justify-content: space-between !important; padding: 12px 20px !important; background: #1e293b !important; border-bottom: 1px solid #334155 !important; } .favdbench-page .code-dots { display: flex; gap: 6px; } .favdbench-page .code-dots i { width: 10px; height: 10px; border-radius: 50%; display: inline-block; } .favdbench-page .code-dots .r { background: #ef4444; } .favdbench-page .code-dots .y { background: #f59e0b; } .favdbench-page .code-dots .g { background: #10b981; } .favdbench-page .code-lang { font-size: 12px; color: var(--el-text-color-secondary); font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; } .favdbench-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #e2e8f0 !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .markdown-body .favdbench-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #e2e8f0 !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .favdbench-page .steps-row { display: flex; align-items: flex-start; justify-content: center; margin-bottom: 48px; } .favdbench-page .stp-card { flex: 1; max-width: 320px; text-align: center; padding: 0 24px; } .favdbench-page .stp-num { font-size: clamp(48px, 6vw, 72px); font-weight: 700; color: #e2e8f0; letter-spacing: -0.04em; line-height: 1; margin-bottom: 20px; } .favdbench-page .stp-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 10px; } .favdbench-page .stp-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } .favdbench-page .stp-conn { width: 60px; height: 2px; background: var(--el-border-color-light); margin-top: 36px; flex-shrink: 0; } .favdbench-page .steps-cta { text-align: center; } @media (max-width: 768px) { .favdbench-page .steps-row { flex-direction: column; align-items: center; gap: 32px; } .favdbench-page .stp-conn { width: 2px; height: 32px; margin: 0; } .favdbench-page .stp-card { max-width: 100%; } } .favdbench-cta { padding: 100px 0; background: #0f172a; text-align: center; position: relative; overflow: hidden; } .favdbench-cta::before { content: ''; position: absolute; top: -100px; left: 50%; transform: translateX(-50%); width: 700px; height: 400px; background: radial-gradient(ellipse, rgba(220, 38, 38, 0.12) 0%, transparent 70%); pointer-events: none; } .favdbench-cta h2 { font-size: clamp(28px, 4vw, 44px); font-weight: 700; color: #f8fafc; letter-spacing: normal; margin-bottom: 28px; position: relative; } .favdbench-cta > div > p { font-size: clamp(16px, 2vw, 18px); color: var(--el-text-color-secondary); max-width: 520px; margin: 0 auto 56px; line-height: 1.6; position: relative; } .favdbench-page .cta-actions { display: flex; gap: 12px; justify-content: center; flex-wrap: wrap; position: relative; } .favdbench-page .btn-cta-light { display: inline-flex; align-items: center; gap: 6px; padding: 14px 32px; background: #dc2626; color: #ffffff !important; border-radius: 9999px; font-size: 15px; font-weight: 700; transition: background 0.2s, transform 0.15s; text-decoration: none !important; } .favdbench-page .btn-cta-light:hover { background: #b91c1c; transform: translateY(-1px); text-decoration: none !important; } .favdbench-page .btn-cta-ghost { display: inline-flex; align-items: center; padding: 14px 32px; background: transparent; color: #94a3b8 !important; border: 1px solid #334155; border-radius: 9999px; font-size: 15px; font-weight: 600; transition: border-color 0.2s, color 0.2s; text-decoration: none !important; } .favdbench-page .btn-cta-ghost:hover { border-color: var(--el-text-color-regular); color: #e2e8f0 !important; text-decoration: none !important; } .favdbench-page code { background: #fee2e2 !important; padding: 2px 8px !important; border-radius: 5px !important; font-size: 13px !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; color: #b91c1c !important; border: 1px solid #fecaca !important; } .favdbench-page .s-text-dark { color: var(--el-text-color-primary); } .favdbench-page .s-text-brand { color: #dc2626; } .favdbench-page .s-section-body { font-size: 16px; color: var(--el-text-color-regular); line-height: 1.8; text-align: center; max-width: 680px; margin: 0 auto; } .favdbench-page .s-section-body p + p { margin-top: 16px; } .favdbench-page .tag-row { display: flex; gap: 8px; flex-wrap: wrap; justify-content: center; margin-top: 16px; } .favdbench-page .tag-item
{
padding: 4px 12px; background: var(--el-bg-color-page);
border: 1px solid var(--el-border-color-light); border-radius: 9999px;
font-size: 12px; font-weight: 600; color: var(--el-text-color-regular);
}
html.dark .favdbench-page { background: var(--el-bg-color); color: var(--el-text-color-primary); }
html.dark .favdbench-page a { color: inherit; }
html.dark .markdown-body .favdbench-page a { color: inherit !important; }
html.dark .markdown-body .favdbench-page a.s-btn-primary,
html.dark .markdown-body .favdbench-page a.btn-cta-light { color: #ffffff !important; }
html.dark .markdown-body .favdbench-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
html.dark .markdown-body .favdbench-page a.btn-cta-ghost { color: #94a3b8 !important; }
html.dark .markdown-body .favdbench-page a.btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
html.dark .favdbench-page .s-bg-white { background: var(--el-bg-color); }
html.dark .favdbench-page .s-bg-gray { background: var(--el-bg-color-page); }
html.dark .favdbench-page .s-bg-dark { background: var(--el-bg-color); }
html.dark .favdbench-page .s-header h2 { color: var(--el-text-color-primary); }
html.dark .favdbench-page .s-header p { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .s-btn-primary { background: #dc2626; color: #ffffff !important; }
html.dark .favdbench-page .s-btn-primary:hover { background: #b91c1c; }
html.dark .favdbench-page .s-btn-secondary {
background: #1e293b; color: var(--el-text-color-primary) !important;
border-color: #475569;
}
html.dark .favdbench-page .s-btn-secondary:hover { background: var(--el-border-color); border-color: var(--el-text-color-regular); }
html.dark .favdbench-hero { background: var(--el-bg-color); }
html.dark .favdbench-hero::before {
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.15) 0%, transparent 70%);
}
html.dark .favdbench-page .hero-badge { background: var(--el-bg-color-page); border-color: var(--el-border-color); color: var(--el-text-color-secondary); }
html.dark .favdbench-hero h1 { color: var(--el-text-color-primary); }
html.dark .favdbench-hero h1 span { color: #f87171; }
html.dark .favdbench-page .hero-subtitle { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .hero-highlights .h-item { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .hero-highlights .h-div { background: var(--el-border-color); }
html.dark .favdbench-stats { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .favdbench-page .stat-val { color: var(--el-text-color-primary); }
html.dark .favdbench-page .stat-lbl { color: var(--el-text-color-regular); }
html.dark .favdbench-page .feat-card {
background: var(--el-bg-color-page); border-color: var(--el-border-color);
}
html.dark .favdbench-page .feat-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .favdbench-page .feat-card h3 { color: var(--el-text-color-primary); }
html.dark .favdbench-page .feat-card p { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .uc-card { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .favdbench-page .uc-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .favdbench-page .uc-card h3 { color: var(--el-text-color-primary); }
html.dark .favdbench-page .uc-card p { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .stp-num { color: #334155; }
html.dark .favdbench-page .stp-card h3 { color: var(--el-text-color-primary); }
html.dark .favdbench-page .stp-card p { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .stp-conn { background: var(--el-border-color); }
html.dark .favdbench-page code {
background: #7f1d1d !important; color: #fca5a5 !important; border-color: #dc2626 !important;
}
html.dark .favdbench-page .s-text-dark { color: var(--el-text-color-primary); }
html.dark .favdbench-page .s-text-brand { color: #f87171; }
html.dark .favdbench-page .s-section-body { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .tag-item { background: var(--el-border-color); border-color: var(--el-text-color-regular); color: var(--el-text-color-secondary); }
html.dark .favdbench-cta { background: #020617; }
html.dark .favdbench-cta::before {
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.2) 0%, transparent 70%);
}
html.dark .favdbench-page .btn-cta-light { color: #ffffff !important; }
html.dark .favdbench-page .btn-cta-ghost { color: #94a3b8 !important; }
html.dark .favdbench-page .btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
</style>
<div class="favdbench-page">
<section class="favdbench-hero">
<div class="s-container-narrow">
<div class="hero-badge">
<span class="badge-dot"></span>
FAVDBench Dataset
</div>
<h1>
FAVDBench<br/><span>Dataset</span>
</h1>
<p class="hero-subtitle">
FAVDBench (Fine-grained Audible Video Description Benchmark) is a fine-grained audio-visual description benchmark dataset proposed at CVPR 2023, aimed at providing detailed textual descriptions for audible videos, including object appearance, spatial location, action information, and sound descriptions.
Dataset Highlights
A refined description benchmark for audio-visual understanding, pushing the frontier of multimodal research
Audio-Visual Fusion
Covers both visual and auditory information, one of the few datasets that incorporates audio descriptions into video description benchmarks, supporting cross-modal research.
Refined Descriptions
Provides fine-grained textual annotations across five dimensions: Appearance, Spatial, Temporal, Action, and Audio.
Multimodal Annotations
Each video contains multidimensional human-annotated information, with high annotation quality, suitable for training and evaluating multimodal generation models.
Academic Benchmark
Proposed and established by a CVPR 2023 paper, widely cited in academia, serving as the standard evaluation benchmark for audio-visual description tasks.
Diverse Content
Videos cover various scenes and themes, including natural scenes, human activities, animal behaviors, etc., ensuring a comprehensive assessment of model generalization capabilities.
Openly Available
The dataset is released under an open license, allowing researchers to freely download and use it, lowering the barriers for academic research and industrial applications.
Applicable Scenarios
From academic research to industrial applications, empowering audio-visual understanding technologies
Video Description Generation
Train and evaluate video description generation models, automatically generating multidimensional natural language descriptions for videos
Audio-Visual Understanding
Research the joint understanding of visual and auditory information, exploring cross-modal semantic alignment and fusion methods
Multimodal Research
Provide high-quality training and evaluation data for visual-language-audio tri-modal pre-training models
Video Subtitle Generation
Develop automatic video subtitle systems to enhance the accessibility and retrievability of video content
Data Preview
Below are annotation examples from the FAVDBench dataset, including fine-grained descriptions across five dimensions
{
"video_id": "video_00123",
"descriptions": {
"appearance": "A brown dog with floppy ears and a red collar stands on green grass.",
"spatial": "The dog is positioned in the center of the frame with trees in the background.",
"temporal": "The video starts with the dog sitting, then it stands up and begins to walk.",
"action": "The dog wags its tail, barks twice, and runs toward the camera.",
"audio": "Birds chirping in the background, followed by two loud barks and rustling grass."
},
"duration": 8.5,
"split": "train"
}
3 Steps to Get Started Quickly
From browsing to usage, you can start your multimodal research in just a few minutes
Browse the Dataset
View dataset details on the Ace Data Cloud platform to understand metadata such as annotation format, data scale, and usage licenses.
Download Data
Download video files and JSON annotation data; the dataset provides standard splits for training, validation, and testing sets.
Load and Use
Use json.load() to load the annotation data, and start training and evaluating multimodal models with a video processing library.
Start Exploring the FAVDBench Dataset
CVPR 2023 Fine-grained Audio-Video Description Benchmark, open license, available for immediate download. Whether you are a multimodal researcher or a video understanding engineer, this dataset is worth a try.
