Emilia Dataset

Emilia
Dataset

Emilia is a large-scale multilingual speech generation dataset, covering over 100,000 hours of speech data, including various languages such as English, Chinese, German, French, Japanese, Korean, and more, with a wide range of speaker diversity and scene coverage, suitable for speech synthesis and speech cloning research. ```

100,000+ hours 6+ languages CC BY-NC 4.0 Open source
Emilia Dataset
🎙️
100,000+
Hours of speech
🌍
6+
Supported languages
👥
50,000+
Number of speakers
📜
CC BY-NC 4.0
Open license

Dataset Highlights

A large-scale multilingual speech dataset that provides a solid foundation for speech synthesis and cloning research

📊

Ultra-large scale

Contains over 100,000 hours of speech data, making it one of the largest open-source speech generation datasets, providing ample data support for large-scale model training.

🌐

Multilingual coverage

Covers multiple languages including English (EN), Chinese (ZH), German (DE), French (FR), Japanese (JA), Korean (KO), supporting cross-language speech research.

👥

Diversity of speakers

Features voice samples from over 50,000 different speakers, covering various ages, genders, and accents, ensuring the model's generalization ability.

🎧

Natural speech recording

The speech data is sourced from natural recordings in real scenarios, covering various styles such as conversations, speeches, and audiobooks, with high naturalness and expressiveness.

📝

High-quality annotations

Each segment of speech is accompanied by precise text transcriptions, speaker identifiers, language tags, and duration information, with standardized annotations for direct use in model training.

🔧

Open-source processing pipeline

Accompanied by the open-source data processing tool Emilia-Pipe, supporting end-to-end processing from raw audio to training data, allowing for the reproduction of the dataset construction process.

Applicable Scenarios

From speech synthesis to speaker verification, covering core research directions in speech AI

🗣️

Text-to-speech

Train high-quality TTS models to generate natural, fluent, and expressive synthetic speech

🎭

Voice cloning

Utilize rich speaker data to achieve few-shot or zero-shot voice cloning, replicating the target speaker's timbre

🌏

Speech translation

Leverage multilingual speech data to build end-to-end speech translation systems for cross-language speech conversion

🔐

Speaker verification

Use large-scale speaker data to train voiceprint recognition models, enhancing speaker verification and recognition accuracy

Speech synthesis Voice cloning TTS Multilingual NLP

Data Preview

Below is a typical metadata example from the Emilia dataset (in JSON format)

JSON
{
"id": "emilia_en_00012345",
"speaker_id": "spk_en_04821",
"language": "en",
"duration": 8.72,
"sample_rate": 24000,
"transcription": "The weather today is absolutely beautiful, perfect for a walk in the park.",
"gender": "female",
"source": "audiobook",
"audio_path": "en/subset_001/emilia_en_00012345.wav"
}
# Another Chinese Sample
{
"id": "emilia_en_00098765",
"speaker_id": "spk_en_01234",
"language": "en",
"duration": 6.35,
"sample_rate": 24000,
"transcription": "Welcome to our program, today we will discuss the latest developments in artificial intelligence.",
"gender": "male",
"source": "podcast",
"audio_path": "en/subset_003/emilia_en_00098765.wav"
}

3 Steps to Get Started Quickly

From browsing to loading, you can start your speech research project in just a few minutes

01

Browse Datasets

View dataset details on the Ace Data Cloud platform to understand metadata such as language distribution, speaker statistics, and licensing agreements.

02

Download Data

Download speech data slices of the target language on demand, each slice contains audio files and corresponding JSON metadata annotations.

03

Load and Use

Use librosa.load() to load audio files, and start model training and speech synthesis experiments with the metadata annotations.

Start Exploring Emilia Speech Data

A large-scale multilingual speech dataset with open licensing, available immediately. Whether you are a speech synthesis researcher or an AI developer, Emilia is your ideal choice.