←back to Blog

NVIDIA AI Just Released the Largest Open-Source Speech AI Dataset and State-of-the-Art Models for European Languages

«`html

Understanding the Target Audience

The target audience for NVIDIA’s release of the Granary dataset and associated models includes developers, researchers, and businesses engaged in artificial intelligence, particularly in the realm of speech recognition and translation. These individuals are typically focused on enhancing their applications with multilingual capabilities, improving user engagement, and increasing accessibility across diverse linguistic backgrounds.

Pain Points

  • Limited access to high-quality datasets for underrepresented languages.
  • Challenges in achieving accurate speech recognition and translation in real-time applications.
  • Resource constraints that hinder the development of effective AI solutions.

Goals

  • To develop scalable, efficient AI models for speech recognition and translation.
  • To enhance user experiences across multiple languages.
  • To contribute to the democratization of AI technologies in Europe.

Interests

  • Innovations in AI and machine learning technologies.
  • Open-source resources and collaborative projects.
  • Real-world applications of multilingual AI solutions.

Communication Preferences

The audience prefers concise, technical communication that includes data-driven insights, practical applications, and peer-reviewed statistics. They value transparency and open discussions in forums and community platforms.

NVIDIA’s Granary: The Foundation of Multilingual Speech AI

NVIDIA has launched Granary, the largest open-source speech dataset for European languages, alongside two advanced models: Canary-1b-v2 and Parakeet-tdt-0.6b-v3. This release aims to provide high-quality resources in automatic speech recognition (ASR) and speech translation (AST), particularly for underrepresented European languages.

Granary Dataset Features

  • Largest open-source speech dataset for 25 European languages.
  • Pseudo-labeling pipeline that enhances audio quality and reduces manual annotation needs.
  • Supports both ASR and AST tasks.
  • Open access for global developers to train models at scale.

Granary offers around 1 million hours of audio, with 650,000 hours dedicated to speech recognition and 350,000 hours for speech translation. It covers nearly all official EU languages, plus Russian and Ukrainian, with a focus on languages like Croatian, Estonian, and Maltese that have limited annotated data.

Canary-1b-v2: Multilingual ASR + Translation

Canary-1b-v2 is a billion-parameter Encoder-Decoder model trained on Granary, providing high-quality transcription and translation between English and 24 supported European languages. Key features include:

  • Support for 25 European languages, doubling coverage from previous models.
  • Performance comparable to models three times larger, with up to 10× faster inference.
  • Multitask capabilities across ASR and AST tasks.
  • Automatic punctuation, capitalization, and word/segment-level timestamps.
  • Robust performance under noisy conditions.

Parakeet-tdt-0.6b-v3: Real-Time Multilingual ASR

Parakeet-tdt-0.6b-v3 is a 600-million-parameter multilingual ASR model designed for high-throughput transcription in all 25 supported languages. Its features include:

  • Automatic language detection for seamless transcription.
  • Real-time capability for transcribing up to 24-minute audio segments in one pass.
  • Low latency and batch processing for commercial applications.

Impact on Speech AI Development

NVIDIA’s Granary dataset and model suite significantly advance the accessibility of speech AI technologies in Europe. They enable the development of:

  • Multilingual chatbots.
  • Customer service voice agents.
  • Near-real-time translation services.

With open access to these resources, developers, researchers, and businesses can create inclusive, high-quality applications that support linguistic diversity.

Explore Further

Check out Granary, NVIDIA Canary-1b-v2, and NVIDIA Parakeet-tdt-0.6b-v3. Visit our GitHub page for tutorials, code, and notebooks. Follow us on Twitter and join our 100k+ ML subreddit for the latest updates.

«`