Skip to content

Add ASR-EOU models and training/eval scripts#14740

Open
stevehuang52 wants to merge 137 commits intomainfrom
heh/eou_pr
Open

Add ASR-EOU models and training/eval scripts#14740
stevehuang52 wants to merge 137 commits intomainfrom
heh/eou_pr

Conversation

@stevehuang52
Copy link
Collaborator

@stevehuang52 stevehuang52 commented Sep 16, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add ASR-EOU models, datasets, training/inference scripts, utilities, etc.

Collection: [asr]

Changelog

EOU model:

  • nemo/collections/asr/models/asr_eou_models.py: extend RNNT and Hybrid-CTC-RNNT models to ASR-EOU task, by using new dataloader and new metrics

EOU dataset:

  • nemo/collections/asr/data/audio_to_eou_label_lhotse.py: new ASR-EOU dataset

EOU examples:

  • examples/asr/asr_eou: Tutorial, training and testing scripts
  • examples/asr/conf/asr_eou: Yaml files for ASR-EOU models

EOU metrics and utils:

  • nemo/collections/asr/parts/utils/eou_utils.py

EOU test:

  • tests/collections/asr/test_asr_eou.py: unit test on EOU metrics

EOU auxiliary scripts:

  • scripts/asr_eou: scripts for data cleaning, creating ASR-EOU tokenizer, noisy eval data creating
  • tools/nemo_forced_aligner/align_eou.py: modified NFA for injecting EOU timestamps to ASR manifests (@weiqingw4ng)

EOU related changes to existing files:

  • nemo/collections/asr/metrics/wer.py: add the option to configure the decode function to return hypotheses and the feature to get those hypotheses from outside.
  • nemo/collections/asr/modules/rnnt.py: add the parameter keep_hypotheses to keep the hypotheses of the decoded outputs after forward, so that outside caller can obtain the hypotheses.

Bug fixes:

  • examples/asr/asr_hybrid_transducer_ctc/helpers/convert_nemo_asr_hybrid_to_ctc.py: fix docstring
  • examples/asr/transcribe_speech.py: fix change att_context_size
  • nemo/collections/asr/losses/ssl_losses/mlm.py: fix bug when mask is None

ASR improvements:

  • docker/Dockerfile.speech: update pytorch container to 25.12
  • nemo/collections/asr/modules/conformer_encoder.py and nemo/collections/asr/modules/ssl_modules/multi_layer_feat.py: merge duplicate code of ConformerMultiLayerFeatureExtractor
  • nemo/collections/asr/modules/lstm_decoder.py: add the option to not add blank token to the classifier, so that it can be used as a general frame classifier.
  • nemo/collections/common/data/lhotse/dataloader.py: add resampling noise to the same sample rate as data config.
  • scripts/speech_recognition/convert_to_tarred_audio_dataset.py: add the option to read from multiple files
  • scripts/speech_recognition/oomptimizer.py: add the option to load mode from local files
  • tools/nemo_forced_aligner/utils/data_prep.py: add support for automatically resolving relative file paths.
  • examples/asr/transcribe_speech_distributed.py: A script that extends the transcribe_speech.py to transcribing a huge amount of audios using multiple nodes and GPUs. The script can split large manifests into small ones, and merge them back when all are finished. This avoids the issue in transcribe_speech_parallel.py, which will restart from the beginning if not finished within one cluster job, and the new script also automatically inherits any improvements on transcribe_speech.py, while transcribe_speech_parallel.py need separate maintenance. In future we can update transcribe_speech_distributed.py to support loading tarred datasets, so that transcribe_speech_parallel.py can be deprecated.

weiqingw4ng and others added 30 commits March 10, 2025 11:39
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: He Huang <heh@nvidia.com>
nithinraok
nithinraok previously approved these changes Feb 24, 2026
Copy link
Member

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from ASR end.

# limitations under the License.

ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:24.07-py3
ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:25.12-py3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this PR merges after 26.02 release. Please update this.

Signed-off-by: He Huang <heh@nvidia.com>
Signed-off-by: He Huang <heh@nvidia.com>
Signed-off-by: He Huang <heh@nvidia.com>
Signed-off-by: He Huang <heh@nvidia.com>
Signed-off-by: He Huang <heh@nvidia.com>
@github-actions
Copy link
Contributor

[🤖]: Hi @stevehuang52 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants