Replies: 1 comment
-
|
Arabic Quran ASR working in NeMo but failing in Riva — common issue! Likely causes:
# Export correctly for Riva
python nemo2riva.py \
--nemo_model your_model.nemo \
--out model.riva
# Ensure same preprocessing
riva:
preprocessor:
sample_rate: 16000
normalize: true
# Match NeMo config exactly
# For Riva streaming
riva_asr:
streaming:
chunk_len: 0.8
buffer_len: 3.2
docker logs riva-server 2>&1 | grep -i errorDebug steps:
We deploy multilingual ASR at RevolutionAI. Is it producing wrong characters or completely empty? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
I am working on an Arabic ASR system for Quran recitation (tajweed, formal MSA Arabic).
The model works correctly when used directly in NeMo, but fails after conversion and deployment in NVIDIA Riva, where it produces empty outputs or only very short tokens.
I am trying to determine:
1. Which ASR model architecture is correct for Quran recitation and streaming
2. Whether my chosen model is supported for Riva streaming
3. What exact configuration (model type + riva-build flags) is required
4. Why NeMo inference works but Riva inference does not
⸻
My Goal
• Language: Arabic (ar-AR)
• Domain: Quran recitation
• Use case: Streaming ASR (low latency)
• Output: Full verse-level transcription, not partial tokens
• Deployment target: NVIDIA Riva
⸻
What Works
• Tokenizer is correct and verified:
• SentencePiece tokenizer
• Arabic text round-trip works (text → ids → text)
• Model inference in NeMo works correctly:
• Full Arabic sentences are decoded
• WER output in NeMo training logs is correct
• Model fine-tuning completed successfully
• .nemo model loads without error
⸻
What Fails
After converting the same model to Riva and deploying:
• Streaming and offline Riva pipelines return:
• Empty transcripts
• Or a single repeated token (e.g. “وَ”)
• No runtime crash
• Riva server starts successfully
• Models appear loaded, but inference output is unusable
⸻
Model Details
• Model type: EncDecHybridRNNTCTCBPEModel
• Encoder: Conformer / FastConformer
• Decoder: RNNT + CTC
• Tokenizer: SentencePiece (1024 vocab)
• Language: Arabic (ar-AR)
⸻
Conversion & Deployment Steps Used
NeMo → Riva
nemo2riva
--out Speech_To_Text_Finetuning.riva
--max-dim 5000
--max-batch 4
--device cuda
Speech_To_Text_Finetuning.nemo
Riva Build (Streaming)
riva-build speech_recognition
asr_streaming.rmir
Speech_To_Text_Finetuning.riva
--streaming=true
--decoder_type=greedy
--ms_per_timestep=40
--chunk_size=4.8
--left_padding_size=1.6
--right_padding_size=1.6
--max_batch_size=4
--featurizer.use_utterance_norm_params=False
--featurizer.precalc_norm_time_steps=0
--featurizer.precalc_norm_params=False
--language_code=ar-AR
Same issue occurs with offline pipeline.
⸻
Known Related Threads (Symptoms Match Exactly)
• “Finetuned ASR conformer returns only empty transcripts”
• “Issue Deploying Fine-Tuned Arabic Conformer Model in NVIDIA Riva”
• “Riva providing empty transcriptions but NeMo does not”
• “Known issue with conformer models – try –nn.use_trt_fp32”
• FastConformer RNNT models reported as not officially supported for Riva streaming
⸻
Key Observations
1. NeMo works, Riva does not
2. Empty or near-empty output is a known Riva failure mode
3. Multiple threads suggest:
• Conformer / FastConformer RNNT streaming is fragile or unsupported
• TRT FP16 causes silent decoding failures
4. Canary models are offline-only
5. Parakeet models are designed for streaming but have limited Arabic coverage
⸻
Questions (Core of This Issue)
1. Which ASR model architecture is officially supported for Arabic streaming ASR in Riva?
• Conformer-CTC?
• Citrinet?
• Parakeet?
• Something else?
2. Is EncDecHybridRNNTCTCBPEModel supported for streaming in Riva?
• If not, what is the recommended alternative?
3. Is Quran recitation a valid use case for Riva streaming ASR, or is offline decoding required?
4. Which riva-build flags are mandatory to avoid empty outputs?
• --nn.use_trt_fp32
• disabling VAD?
• different chunk/padding constraints?
5. Is there an official reference pipeline for Arabic ASR deployment in Riva?
⸻
Environment
• OS: Ubuntu 22.04
• GPU: RTX 3060 (6GB)
• CUDA: 12.x
• NeMo: recent version
• Riva: 2.x
• Audio: 16kHz mono WAV
• Language code: ar-AR
⸻
What I Am Looking For
• A clear recommendation:
• Correct model
• Correct decoding mode (streaming vs offline)
• Correct Riva configuration
• Confirmation whether my current approach is fundamentally incompatible
• A known-good Arabic Riva ASR deployment example
⸻
Thank you for your time.
I am happy to provide logs, configs, or a minimal reproduction if needed.
Beta Was this translation helpful? Give feedback.
All reactions