Add ASR-EOU models and training/eval scripts by stevehuang52 · Pull Request #14740 · NVIDIA-NeMo/NeMo

stevehuang52 · 2025-09-16T15:39:43Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add ASR-EOU models, datasets, training/inference scripts, utilities, etc.

Collection: [asr]

Changelog

EOU model:

nemo/collections/asr/models/asr_eou_models.py: extend RNNT and Hybrid-CTC-RNNT models to ASR-EOU task, by using new dataloader and new metrics

EOU dataset:

nemo/collections/asr/data/audio_to_eou_label_lhotse.py: new ASR-EOU dataset

EOU examples:

examples/asr/asr_eou: Tutorial, training and testing scripts
examples/asr/conf/asr_eou: Yaml files for ASR-EOU models

EOU metrics and utils:

nemo/collections/asr/parts/utils/eou_utils.py

EOU test:

tests/collections/asr/test_asr_eou.py: unit test on EOU metrics

EOU auxiliary scripts:

scripts/asr_eou: scripts for data cleaning, creating ASR-EOU tokenizer, noisy eval data creating
tools/nemo_forced_aligner/align_eou.py: modified NFA for injecting EOU timestamps to ASR manifests (@weiqingw4ng)

EOU related changes to existing files:

nemo/collections/asr/metrics/wer.py: add the option to configure the decode function to return hypotheses and the feature to get those hypotheses from outside.
nemo/collections/asr/modules/rnnt.py: add the parameter keep_hypotheses to keep the hypotheses of the decoded outputs after forward, so that outside caller can obtain the hypotheses.

Bug fixes:

examples/asr/asr_hybrid_transducer_ctc/helpers/convert_nemo_asr_hybrid_to_ctc.py: fix docstring
examples/asr/transcribe_speech.py: fix change att_context_size
nemo/collections/asr/losses/ssl_losses/mlm.py: fix bug when mask is None

ASR improvements:

docker/Dockerfile.speech: update pytorch container to 25.12
nemo/collections/asr/modules/conformer_encoder.py and nemo/collections/asr/modules/ssl_modules/multi_layer_feat.py: merge duplicate code of ConformerMultiLayerFeatureExtractor
nemo/collections/asr/modules/lstm_decoder.py: add the option to not add blank token to the classifier, so that it can be used as a general frame classifier.
nemo/collections/common/data/lhotse/dataloader.py: add resampling noise to the same sample rate as data config.
scripts/speech_recognition/convert_to_tarred_audio_dataset.py: add the option to read from multiple files
scripts/speech_recognition/oomptimizer.py: add the option to load mode from local files
tools/nemo_forced_aligner/utils/data_prep.py: add support for automatically resolving relative file paths.
examples/asr/transcribe_speech_distributed.py: A script that extends the transcribe_speech.py to transcribing a huge amount of audios using multiple nodes and GPUs. The script can split large manifests into small ones, and merge them back when all are finished. This avoids the issue in transcribe_speech_parallel.py, which will restart from the beginning if not finished within one cluster job, and the new script also automatically inherits any improvements on transcribe_speech.py, while transcribe_speech_parallel.py need separate maintenance. In future we can update transcribe_speech_distributed.py to support loading tarred datasets, so that transcribe_speech_parallel.py can be deprecated.

Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>

Signed-off-by: stevehuang52 <heh@nvidia.com>

Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>

Signed-off-by: stevehuang52 <heh@nvidia.com>

…o end_of_utterance

Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>

Signed-off-by: stevehuang52 <heh@nvidia.com>

…o end_of_utterance

Signed-off-by: stevehuang52 <heh@nvidia.com>

Signed-off-by: He Huang <heh@nvidia.com>

nithinraok

LGTM from ASR end.

nithinraok · 2026-02-24T15:33:24Z

docker/Dockerfile.speech

 # limitations under the License.

-ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:24.07-py3
+ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:25.12-py3


If this PR merges after 26.02 release. Please update this.

Signed-off-by: He Huang <heh@nvidia.com>

nemo/collections/asr/modules/conformer_encoder.py

Signed-off-by: He Huang <heh@nvidia.com>

…eou_pr

github-actions · 2026-02-26T22:11:07Z

[🤖]: Hi @stevehuang52 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

weiqingw4ng and others added 30 commits March 10, 2025 11:39

initial commit for end-of-utterance detection

7681d71

Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>

change targets to long() type

68e6b6f

Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>

change output_types()

0069fac

Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>

add random padding and refactor for multiple utterances per sample

1550b56

Signed-off-by: stevehuang52 <heh@nvidia.com>

add handling multiple text groundtruth

867c799

Signed-off-by: stevehuang52 <heh@nvidia.com>

Merge remote-tracking branch 'origin/main' into end_of_utterance

532494a

update and add eval scripts

c9c8a0d

Signed-off-by: stevehuang52 <heh@nvidia.com>

drop sou label and add eob label

201b706

Signed-off-by: stevehuang52 <heh@nvidia.com>

update hybrid-rnnt-ctc and rnnt models to use eou dataset

af380f6

Signed-off-by: stevehuang52 <heh@nvidia.com>

set default return eou frame label to false

82cdb60

Signed-off-by: stevehuang52 <heh@nvidia.com>

handle empty utterance

9b6f95d

Signed-off-by: stevehuang52 <heh@nvidia.com>

add script for injecting special eou tokens into SPE tokenizer

4228641

Signed-off-by: stevehuang52 <heh@nvidia.com>

refactor eou eval utils

ca5dd35

Signed-off-by: stevehuang52 <heh@nvidia.com>

add eou rnnt training

df3151d

Signed-off-by: stevehuang52 <heh@nvidia.com>

update doc

514b6d2

Signed-off-by: stevehuang52 <heh@nvidia.com>

update data augmentation

a63bef3

Signed-off-by: stevehuang52 <heh@nvidia.com>

update data related functions

bc44d9a

Signed-off-by: stevehuang52 <heh@nvidia.com>

fix tokenizer with eou tokens

b6081cf

Signed-off-by: stevehuang52 <heh@nvidia.com>

adding eou force aligner

442dfec

Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>

update for eou

13bdc04

Signed-off-by: stevehuang52 <heh@nvidia.com>

Merge branch 'end_of_utterance' of https://github.com/NVIDIA/NeMo int…

cae3171

…o end_of_utterance

fix the case when 'segments_level_ctm_filepath' is not produced

813be94

Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>

fix force aligner

e9cf11a

Signed-off-by: stevehuang52 <heh@nvidia.com>

Merge branch 'end_of_utterance' of https://github.com/NVIDIA/NeMo int…

d7da23e

…o end_of_utterance

fix aligner

fb4a815

Signed-off-by: stevehuang52 <heh@nvidia.com>

update for asr-eou

e8a49cd

Signed-off-by: stevehuang52 <heh@nvidia.com>

clean up and update infer

5667d71

Signed-off-by: stevehuang52 <heh@nvidia.com>

update

c9502b4

Signed-off-by: stevehuang52 <heh@nvidia.com>

update

016e5cc

Signed-off-by: stevehuang52 <heh@nvidia.com>

fix rnnt_decoding for empty string

78dbb45

Signed-off-by: stevehuang52 <heh@nvidia.com>

rename and move to scripts/asr_eou

bf3bd3c

Signed-off-by: He Huang <heh@nvidia.com>

nithinraok previously approved these changes Feb 24, 2026

View reviewed changes

stevehuang52 added the Run CICD label Feb 24, 2026

stevehuang52 temporarily deployed to test February 24, 2026 16:08 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Feb 24, 2026

fix ci

4d21b81

Signed-off-by: He Huang <heh@nvidia.com>

stevehuang52 dismissed nithinraok’s stale review via 4d21b81 February 24, 2026 17:40

stevehuang52 added the Run CICD label Feb 24, 2026

stevehuang52 temporarily deployed to test February 24, 2026 17:42 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Feb 24, 2026

fix ci

e30af2a

Signed-off-by: He Huang <heh@nvidia.com>

github-advanced-security bot found potential problems Feb 24, 2026

View reviewed changes

nemo/collections/asr/modules/conformer_encoder.py Fixed Show fixed Hide fixed

stevehuang52 added 2 commits February 24, 2026 11:17

clean up

3f8da4d

Signed-off-by: He Huang <heh@nvidia.com>

clean up

dddffb4

Signed-off-by: He Huang <heh@nvidia.com>

stevehuang52 added the Run CICD label Feb 24, 2026

github-actions bot removed the Run CICD label Feb 24, 2026

fix linting

6b2f4d0

Signed-off-by: He Huang <heh@nvidia.com>

stevehuang52 added the Run CICD label Feb 25, 2026

stevehuang52 temporarily deployed to test February 25, 2026 02:04 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Feb 25, 2026

Merge branch 'main' into heh/eou_pr

d9a06e9

stevehuang52 added the Run CICD label Feb 25, 2026

stevehuang52 temporarily deployed to test February 25, 2026 14:37 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Feb 25, 2026

stevehuang52 added 2 commits February 26, 2026 12:09

fix ci

bbd9816

Signed-off-by: He Huang <heh@nvidia.com>

Merge branch 'heh/eou_pr' of https://github.com/NVIDIA/NeMo into heh/…

5b153fb

…eou_pr

stevehuang52 added the Run CICD label Feb 26, 2026

stevehuang52 temporarily deployed to test February 26, 2026 20:11 — with GitHub Actions Inactive

github-actions bot removed the Run CICD label Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ASR-EOU models and training/eval scripts#14740

Add ASR-EOU models and training/eval scripts#14740
stevehuang52 wants to merge 137 commits intomainfrom
heh/eou_pr

stevehuang52 commented Sep 16, 2025 •

edited

Loading

Uh oh!

nithinraok left a comment

Uh oh!

nithinraok Feb 24, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

stevehuang52 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

EOU model:

EOU dataset:

EOU examples:

EOU metrics and utils:

EOU test:

EOU auxiliary scripts:

EOU related changes to existing files:

Bug fixes:

ASR improvements:

Uh oh!

nithinraok left a comment

Choose a reason for hiding this comment

Uh oh!

nithinraok Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stevehuang52 commented Sep 16, 2025 •

edited

Loading