Voice AI

Deepgram Adds 10 Languages and Keyterm Prompting to Nova-3

Written by
Sakshi Batavia
Created On
12 January, 2025

Table of Contents

Don’t miss what’s next in AI.

Subscribe for product updates, experiments, & success stories from the Nurix team.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Speech recognition accuracy still drops sharply once audio moves beyond a narrow set of languages, accents, and domain vocabulary. For global teams deploying voice AI, this shows up as inconsistent transcripts, higher review effort, and uneven performance across regions.

In December 2025, Deepgram moved to address this gap by expanding Nova-3 with 10 additional monolingual languages and introducing multilingual keyterm prompting. The update increases Nova-3’s total language coverage to 31 languages and adds greater control over how specialized terminology is recognized in real production audio.

This blog examines what changed, the technical impact of the update, and what it means for teams building voice systems at scale.

Key Takeaways

  • Complex Languages First: The 10 new languages were added for linguistic difficulty, not popularity, targeting regions where ASR accuracy typically breaks down.
  • Real Accuracy Gains: Nova-3 shows consistent Word Error Rate reductions over Nova-2 across all new languages, driven by core model improvements rather than tuning.
  • Streaming Leads: Streaming transcription outperforms batch processing in roughly half of the tested languages, reinforcing Nova-3’s real-time focus.
  • Inference-Time Control: Multilingual keyterm prompting shifts vocabulary accuracy from retraining workflows to inference-time configuration.
  • Infrastructure Readiness: With broader language depth and predictable behavior, voice AI adoption increasingly depends on integration, not feasibility.

Why Language Expansion Matters for Enterprise Voice AI

The global voice AI market continues to grow exponentially, but legacy ASR systems have fundamentally struggled with the world's linguistic diversity. Most traditional speech-to-text models were engineered around English phonetics and Western European language patterns, leading to significant accuracy degradation when deployed in morphologically complex, tonal, or script-diverse languages.

Deepgram's approach to Nova-3 differs fundamentally from this legacy model. Rather than forcing a single acoustic and linguistic pattern onto every language, Nova-3 was designed to adapt natively to each language's unique structural characteristics.

Why This Matters for Global Enterprises

  • Market reach: Organizations can now deploy voice AI across billions of additional speakers in previously underserved markets.
  • Accuracy consistency: Enterprise customers operating globally no longer need to employ separate transcription solutions for different language regions.
  • Operational efficiency: A unified platform reduces complexity, training overhead, and integration costs across multinational deployments.
  • Competitive advantage: Companies can now build voice-enabled features in customer support, healthcare, legal, and financial sectors across these 10 new markets.

The 10 New Languages: Deep Linguistic Challenges and Solutions

The December 2025 expansion adds 10 monolingual languages, grouped across Europe and Southeast Asia. These languages were selected because of their linguistic complexity and demand across customer support, media, and enterprise analytics workloads.

Southern and Eastern European Languages

  1. Greek (el): This language presents significant challenges due to inflectional morphology and variable word stress patterns. Greek vowel alternations and complex compound forms traditionally confuse generic ASR systems. Nova-3 addresses these through improved modeling of vowel shifts and morphological inflection, enabling more accurate recognition across formal and conversational contexts.
  2. Romanian (ro): As a Romance language heavily influenced by Slavic phonetics, Romanian combines multiple linguistic systems in ways that baffle conventional ASR models. The language features strong case inflection, variable stress patterns, and mid-word vowel shifts that require nuanced acoustic modeling. Nova-3 delivers improved handling of these endings, stress variations, and vowel transitions.
  3. Slovak (sk): This Slavic language presents particularly challenging consonant clusters and rich case systems that exceed the capacity of general ASR models. Nova-3 improves recognition of grammatical gender marking and declension patterns, making it significantly more accurate in real-time and batch scenarios.
  4. Catalan (ca): Sitting at the intersection of Spanish and French linguistic influence, Catalan exhibits vowel reduction patterns and multiple regional dialects. Nova-3 strengthens recognition across both conversational and broadcast speech variants, delivering consistency across Catalonia and the Valencian region.

Northern and Baltic European Languages

  1. Lithuanian (lt): A Baltic language with free stress patterns and pitch accent, Lithuanian demands specialized acoustic modeling. The language's rich morphology and long compound formations traditionally challenge standard ASR systems. Nova-3 improves accuracy for complex morphological structures and compound word segmentation.
  2. Latvian (lv): Latvian's distinctive vowel length contrasts and consonant palatalization require careful acoustic modeling. Nova-3 increases clarity and keyword recall across varied speaking speeds, crucial for customer support and voice applications in Latvia and the diaspora.
  3. Estonian (et): Combining vowel harmony with a complex three-length quantity system, Estonian demands phonetically precise models. Nova-3 improves segmentation and prosodic modeling in real-time scenarios, making streaming transcription more reliable.
  4. Flemish (nl-BE): The Belgian variant of Dutch exhibits distinctive regional phonetic patterns that diverge from standard Dutch. Nova-3 enhances accuracy across both colloquial and broadcast environments, recognizing the unique phonetic characteristics of Belgian Flemish speakers.
  5. Swiss German (de-CH): This regional variant exhibits extensive dialectal diversity within Switzerland itself. Nova-3 adapts more effectively to high-variance speech patterns, accommodating the significant differences between Swiss German and standard German pronunciation.

Southeast Asian Expansion

  1. Malay (ms): Malay combines Austronesian linguistic roots with significant English and Arabic loanwords. Nova-3 improves accuracy in multilingual and code-switched settings, where English and Malay frequently intermingle in conversation. This capability is particularly valuable for corporate environments across Malaysia, Singapore, and Brunei.

Quantifying Accuracy Improvements: The Nova-3 Advantage

Nova-3 shows measurable Word Error Rate improvements when compared with Nova-2 across all 10 newly added languages. Deepgram’s published benchmarks confirm that these gains appear across both streaming and batch transcription, with particularly strong results in real-time scenarios.

Verified Benchmark Outcomes

Deepgram’s published benchmarks show that accuracy gains are broad-based and repeatable across languages and transcription modes.

  • Consistent Accuracy Gains Across Languages: All 10 languages demonstrate lower Word Error Rate in at least one transcription mode when evaluated against Nova-2, confirming broad-based accuracy improvements rather than isolated gains.
  • Multi-Mode Improvements: Several languages show reductions in Word Error Rate across both batch and streaming transcription, indicating that improvements apply across different processing paths rather than a single optimized mode.
  • Top Performing Languages: Malay, Romanian, and Slovak record some of the largest relative accuracy gains in Deepgram’s tests, with reported reductions exceeding 20% in select scenarios.
  • Streaming Transcription Strength: Streaming transcription outperforms batch processing in roughly half of the evaluated languages, reinforcing Nova-3’s strength in real-time use cases such as live calls and captions.
  • Dialect And Morphology Resilience: Languages with complex morphology or strong regional variation still show measurable improvements, even where pronunciation and grammar vary significantly.

Language-Specific Performance Patterns

Accuracy gains appear across structurally different languages, highlighting improvements across multiple linguistic dimensions.

  • Complex Morphology Languages: Lithuanian, Latvian, and Slovak display clear gains, reflecting improved handling of inflection, case systems, and compound word formation.
  • Dialect-Heavy Variants: Swiss German and Flemish achieve strong accuracy improvements despite regional pronunciation differences and informal speech patterns.
  • Southeast Asian Speech Characteristics: Malay shows particularly strong results in conversational audio, including mixed-language and informal speech contexts.

What These Results Indicate

The consistency of these gains points to underlying architectural improvements rather than language-specific tuning.

  • Improved Acoustic Modeling: Reduced error rates across unrelated language families point to stronger phoneme recognition rather than language-specific tuning.
  • Better Word Boundary Detection: Gains in morphologically rich languages suggest clearer segmentation of words and endings during decoding.
  • Real-Time Readiness: Stronger streaming results support Nova-3’s positioning for live transcription workloads rather than offline-only processing.

Taken together, these benchmark outcomes indicate that Nova-3’s accuracy gains come from underlying modeling improvements rather than incremental refinements, resulting in more reliable transcription across diverse languages and speech patterns.

Multilingual Keyterm Prompting: The Game-Changing Feature

Nova-3 Multilingual now supports a major feature upgrade: multilingual keyterm prompting, allowing developers to pass up to 500 tokens (approximately 100 words) to boost recognition of brand names, industry jargon, proper nouns, and mission-critical vocabulary across multilingual audio.

Core Capabilities of Multilingual Keyterm Prompting

  • Multi-Language Support In Single Request: Developers can provide key terms once, and Nova-3 automatically prioritizes recognition across all supported languages.
  • Massive Token Capacity: Supporting up to 500 tokens per request means developers can provide comprehensive domain vocabularies in a single API call
  • Zero Retraining Requirement: Unlike traditional model customization approaches, keyterm prompting requires no model retraining, no fine-tuning datasets, and no extended development cycles.
  • Instant Inference-Time Adaptation: The system adapts in real-time as the transcription model processes audio.

How Multilingual Keyterm Prompting Works

Keyterm prompting operates during inference, not training. When a request includes keyterms, Nova-3 biases decoding toward those terms when acoustically plausible.

Keyterm Prompting – What It Does vs Does Not Do
What It Does What It Does Not Do
Applies keyterm prompting during inference at transcription time Does not retrain the model or modify training data
Biases decoding toward supplied keyterms when acoustically plausible Does not force substitutions that contradict strong acoustic evidence
Prioritizes correct spelling for domain-specific terms when spoken Does not override the core acoustic signal
Applies the same keyterm list across all supported languages in a request Does not require language-specific keyterm configuration
Supports proper nouns, product names, acronyms, and technical phrases Does not guarantee insertion of keyterms if they are not spoken
Improves recognition of recurring domain vocabulary Does not reduce overall transcript readability or introduce artificial phrasing

Multilingual keyterm prompting gives teams practical control over vocabulary accuracy without adding training overhead or operational complexity.

Real-World Use Cases for Keyterm Prompting

Multilingual keyterm prompting enables Nova-3 to recognize domain-specific vocabulary accurately across languages and environments where transcription errors carry real consequences.

1. Healthcare and Medical Transcription

Medical professionals rely on precise recognition of pharmaceutical names, anatomical terms, and clinical procedures. A medical transcription service using Nova-3 can now upload terminology like "Clindamycin," "Tretinoin," and specialized procedure names, ensuring these critical terms are recognized accurately even in noisy clinical environments.

2. Retail and Food Service

Quick-service restaurant chains face the challenge of transcribing drive-thru orders with precise product names and modifiers. When a customer orders a "Classic Buttery Jack Burger with Halfsie Fries," the model must recognize these proprietary product names accurately. Nova-3 with keyterm prompting ensures these brand-specific terms are recognized reliably.

3. Finance and Compliance

Financial institutions deploying voice AI for compliance recording and analysis require precise recognition of ticker symbols, financial instruments, and regulatory terminology. Multilingual keyword prompting allows firms to maintain vocabularies across multiple language regions without retraining.

4. Global Customer Support

Multilingual enterprises operating across Europe, Asia, and Latin America can now pass domain-specific terminology once and have Nova-3 apply it across all supported languages simultaneously. A multinational bank supporting customers in Spanish, Portuguese, and English can maintain a single vocabulary of financial products, and Nova-3 will recognize them accurately regardless of which language the customer is speaking.

5. Emergency Response and Public Safety

Emergency dispatch centers handling multilingual communities can now prime Nova-3 with location names, emergency terminology, and agency-specific codes, ensuring accurate transcription of critical information regardless of the caller's primary language.

Across these scenarios, keyterm prompting shifts transcription accuracy from a post-processing concern to a configurable input, reducing risk in environments where terminology precision directly affects outcomes.

Implementation: Deploying Nova-3 for the 10 New Languages

Deploying Nova-3 for the newly supported languages requires minimal changes to existing pipelines. Deepgram’s update is designed to fit directly into current batch or streaming transcription workflows while supporting both cloud-hosted and self-hosted environments.

API Integration Simplified

Switching to any of the newly supported languages requires minimal API changes. Developers simply update their API request with the appropriate language code:

curl --request POST \
  --header "Authorization: Token YOUR_DEEPGRAM_API_KEY" \
  --header "Content-Type: audio/wav" \
  --data-binary @youraudio.wav \
  "https://api.deepgram.com/v1/listen?model=nova-3&language=el"

Supported language codes for the new expansion include: el (Greek), lt (Lithuanian), lv (Latvian), ms (Malay), sk (Slovak), ca (Catalan), et (Estonian), nl-BE (Flemish), de-CH (Swiss German), and ro (Romanian).

Multilingual Keyterm Prompting Implementation

To leverage multilingual keyterm prompting, developers pass their list of key terms through the keyterms parameter in their Nova-3 Multilingual request. The system accepts up to 500 tokens per request, allowing comprehensive domain vocabularies to be applied across all supported languages simultaneously.

Self-Hosted Deployment

Deepgram also supports self-hosted deployment through updated container images released with the December 2025 update (251210). Organizations with stringent data residency or security requirements can deploy Nova-3 on their own infrastructure while maintaining access to the same language expansion and keyterm prompting capabilities.

This deployment approach keeps adoption friction low, allowing teams to extend language coverage and vocabulary control without restructuring existing transcription workflows.

The Broader Trajectory: Deepgram's Global Expansion Strategy

Deepgram’s expansion of Nova-3 reflects a long-term strategy focused on global speech coverage, production reliability, and linguistic depth. Rather than prioritizing rapid multilingual claims, the company has expanded language support in stages, pairing large-scale real-world audio processing with language-specific modeling to support increasingly complex linguistic structures.

Key Elements of the Expansion Strategy

  • Industry-Scale Audio Processing: Deepgram reports processing more than 50,000 years of audio and transcribing over one trillion words across its platform, providing the data volume needed to support diverse accents, environments, and languages.
  • Enterprise Production Adoption: Major organizations, including Citi, Vapi, Groq, Twilio, and Spotify, rely on Deepgram’s speech models in production, indicating sustained operational usage rather than experimental deployment.
  • Mission-Critical Reliability: Deepgram’s technology is used to transcribe communications between the International Space Station and NASA’s Mission Control, highlighting reliability in high-stakes, low-error-tolerance environments.
  • Phased Language Expansion: Deepgram’s language roadmap shows a progression from widely spoken commercial languages toward languages with greater linguistic complexity, including tonal, morphologically rich, and dialect-heavy languages.
  • Language-Specific Modeling: Each new language release reflects targeted acoustic and linguistic training rather than a uniform multilingual approach, reducing error rates in languages that diverge significantly from English structure.
  • Real-World Speech Readiness: This strategy improves transcription performance in informal speech, regional pronunciation variation, and code-switching scenarios commonly found in production audio.

Deepgram’s global expansion strategy prioritizes depth over breadth by combining large-scale speech data with language-aware modeling. This approach supports Nova-3’s continued growth into complex linguistic regions while maintaining consistent transcription reliability across enterprise and mission-critical use cases.

What This Expansion Means for Voice AI Adoption

This Nova-3 update marks a shift in how speech recognition fits into product roadmaps. Language coverage and vocabulary control are starting to behave like configurable infrastructure, not long-term research projects. That change alters how quickly teams can move from prototype to production.

For builders, this reduces the need to treat language support as a separate engineering problem for each market. For organizations scaling voice systems, it introduces more predictable performance as deployments expand into new regions and industries. The focus moves away from constant tuning and correction toward reliability at launch.

As speech models gain broader language depth and finer control at inference time, voice interfaces begin to resemble other mature platform components. Adoption becomes a question of integration and design, rather than feasibility.

Conversational AI for Sales and Support teams

Talk to our team to see how to see how Nurix powers smarter engagement.

Let’s Talk

Ready to see what agentic AI can do for your business?

Book a quick demo with our team to explore how Nurix can automate and scale your workflows

Let’s Talk
What Models Are Available In Deepgram’s Speech-to-Text API?

Deepgram’s API includes several model options, including Nova-3 (high-accuracy, multilingual, robust transcription), Flux (optimized for real-time conversational audio with turn detection), industry-tuned models for specific domains, and custom models trained on proprietary datasets.

Does Deepgram Support Multilingual Code-Switching Transcription?

Yes. Deepgram supports multilingual code-switching transcription when using Nova-3 or Nova-2 models, allowing audio with mixed languages to be transcribed in a single pass.

How Do I Get Started With Deepgram’s Speech-to-Text API?

To begin, developers obtain an API key, choose whether they need real-time streaming or batch processing, and select the appropriate model and language parameters in their request. Deepgram provides SDKs and code examples to simplify integration.

Can Keyterm Prompting Be Used With Nova-3 Multilingual?

Yes. Keyterm prompting has been expanded to work with the Nova-3 multilingual model. If an older Nova-3 version is used, the API returns an error until the model is updated to support keyterm prompting.

Is There a Feature for Identifying Speaker Turns or Dialogue Flow?

Deepgram’s API includes models (like Flux) that offer built-in conversational features such as turn detection and interruption handling, which improve contextual transcription for voice agents and dialogue-rich audio.

Related

Related Blogs

Explore All

Start your AI journey
with Nurix today

Contact Us