-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Open
Labels
Description
Feature Request: Urdu G2P for NeMo TTS
Motivation
Urdu is a widely spoken language (~230M speakers) with no current G2P support in NeMo.
Adding Urdu G2P would enable TTS model training and inference for Urdu text.
Urdu Script Notes
- Written in Nastaliq/Naskh Arabic script (RTL)
- Uses Urdu-specific characters (ے، ں، ڈ، ڑ، ھ etc.)
- IPA phoneme set exists for Urdu
- Resources available: eSpeakNG supports Urdu, open Urdu pronunciation dictionaries exist
Proposed Implementation
- Add
UrduG2pclass undernemo/collections/tts/g2p/ - Add Urdu phoneme set / IPA mapping
- Optionally: dictionary-based fallback using existing Urdu lexicons
I am willing to contribute a PR for this.
Reactions are currently unavailable