Skip to content

Add Urdu (ur) G2P support to TTS pipeline #15445

@mwzkhalil

Description

@mwzkhalil

Feature Request: Urdu G2P for NeMo TTS

Motivation

Urdu is a widely spoken language (~230M speakers) with no current G2P support in NeMo.
Adding Urdu G2P would enable TTS model training and inference for Urdu text.

Urdu Script Notes

  • Written in Nastaliq/Naskh Arabic script (RTL)
  • Uses Urdu-specific characters (ے، ں، ڈ، ڑ، ھ etc.)
  • IPA phoneme set exists for Urdu
  • Resources available: eSpeakNG supports Urdu, open Urdu pronunciation dictionaries exist

Proposed Implementation

  • Add UrduG2p class under nemo/collections/tts/g2p/
  • Add Urdu phoneme set / IPA mapping
  • Optionally: dictionary-based fallback using existing Urdu lexicons

I am willing to contribute a PR for this.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions