Add ASR-EOU models and training/eval scripts#14740
Open
stevehuang52 wants to merge 137 commits intomainfrom
Open
Add ASR-EOU models and training/eval scripts#14740stevehuang52 wants to merge 137 commits intomainfrom
stevehuang52 wants to merge 137 commits intomainfrom
Conversation
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
…o end_of_utterance
Signed-off-by: Weiqing Wang <weiqingw@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
…o end_of_utterance
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: He Huang <heh@nvidia.com>
nithinraok
previously approved these changes
Feb 24, 2026
| # limitations under the License. | ||
|
|
||
| ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:24.07-py3 | ||
| ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:25.12-py3 |
Member
There was a problem hiding this comment.
If this PR merges after 26.02 release. Please update this.
Signed-off-by: He Huang <heh@nvidia.com>
Contributor
|
[🤖]: Hi @stevehuang52 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Add ASR-EOU models, datasets, training/inference scripts, utilities, etc.
Collection: [asr]
Changelog
EOU model:
nemo/collections/asr/models/asr_eou_models.py: extend RNNT and Hybrid-CTC-RNNT models to ASR-EOU task, by using new dataloader and new metricsEOU dataset:
nemo/collections/asr/data/audio_to_eou_label_lhotse.py: new ASR-EOU datasetEOU examples:
examples/asr/asr_eou: Tutorial, training and testing scriptsexamples/asr/conf/asr_eou: Yaml files for ASR-EOU modelsEOU metrics and utils:
nemo/collections/asr/parts/utils/eou_utils.pyEOU test:
tests/collections/asr/test_asr_eou.py: unit test on EOU metricsEOU auxiliary scripts:
scripts/asr_eou: scripts for data cleaning, creating ASR-EOU tokenizer, noisy eval data creatingtools/nemo_forced_aligner/align_eou.py: modified NFA for injecting EOU timestamps to ASR manifests (@weiqingw4ng)EOU related changes to existing files:
nemo/collections/asr/metrics/wer.py: add the option to configure the decode function to return hypotheses and the feature to get those hypotheses from outside.nemo/collections/asr/modules/rnnt.py: add the parameterkeep_hypothesesto keep the hypotheses of the decoded outputs after forward, so that outside caller can obtain the hypotheses.Bug fixes:
examples/asr/asr_hybrid_transducer_ctc/helpers/convert_nemo_asr_hybrid_to_ctc.py: fix docstringexamples/asr/transcribe_speech.py: fix change att_context_sizenemo/collections/asr/losses/ssl_losses/mlm.py: fix bug when mask is NoneASR improvements:
docker/Dockerfile.speech: update pytorch container to 25.12nemo/collections/asr/modules/conformer_encoder.pyandnemo/collections/asr/modules/ssl_modules/multi_layer_feat.py: merge duplicate code ofConformerMultiLayerFeatureExtractornemo/collections/asr/modules/lstm_decoder.py: add the option to not add blank token to the classifier, so that it can be used as a general frame classifier.nemo/collections/common/data/lhotse/dataloader.py: add resampling noise to the same sample rate as data config.scripts/speech_recognition/convert_to_tarred_audio_dataset.py: add the option to read from multiple filesscripts/speech_recognition/oomptimizer.py: add the option to load mode from local filestools/nemo_forced_aligner/utils/data_prep.py: add support for automatically resolving relative file paths.examples/asr/transcribe_speech_distributed.py: A script that extends thetranscribe_speech.pyto transcribing a huge amount of audios using multiple nodes and GPUs. The script can split large manifests into small ones, and merge them back when all are finished. This avoids the issue intranscribe_speech_parallel.py, which will restart from the beginning if not finished within one cluster job, and the new script also automatically inherits any improvements ontranscribe_speech.py, whiletranscribe_speech_parallel.pyneed separate maintenance. In future we can updatetranscribe_speech_distributed.pyto support loading tarred datasets, so thattranscribe_speech_parallel.pycan be deprecated.