Release NVIDIA Neural Modules 2.7.0 · NVIDIA-NeMo/NeMo

Highlights

Speech
- Adds Per-Stream Phrase Boosting in ASR Decoding (Transducers) #15125
- Adds support for streaming speech translation #15132
- Released new model nemotron-speech-streaming-en-0.6b that performs English Streaming ASR
- Released new TTS model magpie_tts_multilingual_357m for multilingual Text-to-Speech

Starting with the next release, NeMo 2.8.0, the following collections will be removed: avlm, diffusion, llm, multimodal, multimodal-autoregressive, nlp, speechlm, vision, vlm, and this repo will focus solely on speech tasks: ASR, TTS, speaker diarization, and speech enhancement.

Detailed Changelogs:

ASR

Changelog

Enable CUDA graphs in streaming tests by @artbataev :: PR: #14953
Update ctc-segmentation by @chtruong814 :: PR: #14991
check asr models by @nithinraok :: PR: #14989
Unified inference of streaming ASR by @naymaraq :: PR: #14817
Update numba to numba-cuda and update cuda python bindings usage by @chtruong814 :: PR: #15018
Fixing lines for multispeaker pipeline by @tango4j :: PR: #15030
Inference optimization for cache-aware pipelines by @naymaraq :: PR: #15035
fix loading of hyb ctc rnnt bpe models when using from pretrained by @nithinraok :: PR: #15042
removed old buffered CTC script by @naymaraq :: PR: #15061
remove nlp related notebooks by @nithinraok :: PR: #15070
Update MagpieTTS model with latest changes by @blisc :: PR: #15031
ASR inference: expose RNN-T decoding params for context biasing by @artbataev :: PR: #15091
update notebook by @nithinraok :: PR: #15093
Fix: Obsolete Attribute [SDE] by @Jorjeous :: PR: #15105
Upgrade NeMo ASR tutorials from Mozilla/CommonVoice to Google/FLEURS by @KunalDhawan :: PR: #15103
Add support for AIS batch loading for ASR audio processing by @gaikwadabhishek :: PR: #15102
Multi-Talker Parakeet Streaming - NeMo Documents and Tutorial Notebooks PR 03 by @tango4j :: PR: #15025
[Fix] Fix the notebook errors on multispeaker data simulation and end to end diarization training by @tango4j :: PR: #15149
Streaming transducer inference: fix memory usage, improve WER by @artbataev :: PR: #15148
Execute with subprocess list by @nithinraok :: PR: #15165
Chunking fix by @nune-tadevosyan :: PR: #15163
ASR Decoding: allow fallback to CUDA graphs without while loops by @artbataev :: PR: #15173
remove nlp/modules by @dimapihtar :: PR: #14934
Asr numpy 2 fix by @nithinraok :: PR: #15166
Adding flexible input sources for Diarization Mixin by @tango4j :: PR: #15184
Add support for streaming speech translation by @naymaraq :: PR: #15132
Confidence fix get_correct_marks by @nune-tadevosyan :: PR: #15128
Chunking edge cases by @nune-tadevosyan :: PR: #15182
update subprocess cmd by @nithinraok :: PR: #15218
Changes required for enabling prompt based models in Nemo Inference by @arushidNV :: PR: #15036
Fixing the missing sample_rate argument in mixin calling in Sortformer model file by @tango4j :: PR: #15228
Fix audio tensor loading canary2 by @nithinraok :: PR: #15265
Fix word confidence return by @nithinraok :: PR: #15249
feat(asr): add optional auxiliary timestamp model restoration for Canary by @chaosido :: PR: #15268
Performance: Optimize .nemo tar extraction & model config processing by @paulirish :: PR: #15245
fix speech commands notebook by @nithinraok :: PR: #15290
fix timestamps processing with audio tensor input by @nithinraok :: PR: #15291
Update conv_asr.py preventing unnecessary calculations by @tamilselvan0x0 :: PR: #15239
Bump to pytorch 25.11 by @chtruong814 :: PR: #15247
Add FeatureBuffer support to Cache-Aware streaming pipeline by @arushidNV :: PR: #15188
Per-Stream Phrase Boosting in ASR Decoding (Transducers) by @artbataev :: PR: #15125
Sort audio by duration in ASR streaming inference script by @artbataev :: PR: #15297
ASR transcribe: fix forced decoder reinstantiation with timestamps=True by @artbataev :: PR: #15298
Removes use of torchaudio and moves transforms inside of NeMo by @blisc :: PR: #15211
Add sacrebleu to ASR requirements by @pzelasko :: PR: #15016
SpeechLM2 : Add support for offset key in Multimodal conversation by @AudranBert :: PR: #15281
Add cross-attention to output hypotheses by @mgaido91 :: PR: #15229
Add warm-ups for RTFX calculation in streaming ASR pipelines by @naymaraq :: PR: #15313
Speedup buffered transducer inference: remove double decoding by @artbataev :: PR: #15301
improve canary performance on short audio by @nithinraok :: PR: #15317
Transducer Decoding: Move fusion models to the base class by @artbataev :: PR: #15322
Add typing to speech_to_text_finetune.py by @Garvys :: PR: #15326
Bugfix: correct fusion scores for TDT by @artbataev :: PR: #15325
Fix ASR streaming script: correctly add biasing requests to model by @artbataev :: PR: #15334
Fix ASR context biasing in streaming TDT decoding by @artbataev :: PR: #15327

TTS

Changelog

Remove HeteronymClassificationModel by @blisc :: PR: #14980
remove nlp.parts collection by @dimapihtar :: PR: #14617
Update MagpieTTS model with latest changes by @blisc :: PR: #15031
remove nlp/modules by @dimapihtar :: PR: #14934
[TTS] MagpieTTS Inference Refactoring by @subhankar-ghosh :: PR: #15178
[DRAFT][TTS] Magpietts Simple API and loading audiocodec from Huggingface by @subhankar-ghosh :: PR: #15172
[TTS][MagpieTTS] Change French tokenizer to use 'french_chartokenizer' by @subhankar-ghosh :: PR: #15205
Add Japanese g2p katakana accent support by @quapham :: PR: #15170
[TTS][MagpieTTS] Longform TTS using MagpieTTS by @subhankar-ghosh :: PR: #15210
[voice agent] Fixing the missing arguments calling in NemoSTTService by @SangwonSUH :: PR: #15233
[TTS] MagpieTTS inference: Add command line option to select a subset of datasets to run inference on by @rfejgin :: PR: #15212
[TTS] Allow inference without reference audio by @rfejgin :: PR: #15213
[TTS] Refactor Magpie to support codec conversion and bandwidth extension by @rlangman :: PR: #15191
[TTS] MagpieTTS: Implement Frechet Codec Distance metric + some minor inference bugfixes by @rfejgin :: PR: #15223
Update MagpieTTS' Inference Parameter Configuration by @blisc :: PR: #15254
[TTS][MagpieTTS] Add longform capability to do_tts method by @subhankar-ghosh :: PR: #15241
[TTS] Add tests of the MagpieTTS inference CLI by @rfejgin :: PR: #15272
[MagpieTTS][TTS] Support local transformer in longform magpietts by @subhankar-ghosh :: PR: #15296
Removes use of torchaudio and moves transforms inside of NeMo by @blisc :: PR: #15211
[MagpieTTS][Docs] Add magpietts docs by @subhankar-ghosh :: PR: #15302
Add Hindi (hi-IN) support for TTS by @quapham :: PR: #15248
build: Explicitly set torch >= 2.6.0 and remove weights_only=False by @chtruong814 :: PR: #15314
[MagpieTTS] Fix incorrect sort order comment in pareto_rank function by @matteolippi :: PR: #15333

NLP / NMT

Changelog

remove nlp.parts collection by @dimapihtar :: PR: #14617
chore: remove ExportDeploy by @pablo-garay :: PR: #15033
remove nlp related notebooks by @nithinraok :: PR: #15070
Add deprecation notice to modules by @chtruong814 :: PR: #15050
[OMNIML-3034] ModelOpt rename from TRT ModelOpt to ModelOpt by @yueshen2016 :: PR: #15147
remove nlp/modules by @dimapihtar :: PR: #14934
Add support for streaming speech translation by @naymaraq :: PR: #15132
Remove hardcoded DEBUG logging level in gpt_oss.py by @yurekami :: PR: #15236
Docs: replace removed preprocess_data_for_megatron.py with Megatron-L… by @Saibabu7770 :: PR: #15222
remove nlp documentation by @dimapihtar :: PR: #15304
fix speech translation vllm dockerfile by @naymaraq :: PR: #15310

Text Normalization / Inverse Text Normalization

Changelog

Add import guards for mcore lightning module by @chtruong814 :: PR: #14970
chore: update Lightning requirements version by @liquor233 :: PR: #15004

NeMo Tools

Changelog

Fix: Obsolete Attribute [SDE] by @Jorjeous :: PR: #15105
Updated tutorial on SDE, due to changes in colab and libraries by @Jorjeous :: PR: #15137

Export

Changelog

chore: remove ExportDeploy by @pablo-garay :: PR: #15033
[OMNIML-3034] ModelOpt rename from TRT ModelOpt to ModelOpt by @yueshen2016 :: PR: #15147
fix: Raise exception in nemo.export instead of allowing pickle.loads by @chtruong814 :: PR: #15266

Bugfixes

Changelog

Fix PEFT resume with resume_from_path by @maanug-nv :: PR: #14966
Update deprecated env var by @maanug-nv :: PR: #14975
Revert lhotse patch after updating to lhotse 1.32.2 by @chtruong814 :: PR: #15329

Uncategorized:

Changelog

Version bump to 2.7.0rc0.dev0 by @github-actions[bot] :: PR: #14956
Update changelog for v2.5.1 by @github-actions[bot] :: PR: #14967
Bump MCore, TE, Pytorch, and modelopt for 25.11 by @chtruong814 :: PR: #14946
Remove code related to nemo-evaluator (aka nemo-eval) by @athitten :: PR: #14964
Update changelog for r2.5.0 by @github-actions[bot] :: PR: #14990
Add clear resharding message error message by @mikolajblaz :: PR: #14962
Fix Evo2 checkpoint backward compatibility by @farhadrgh :: PR: #14914
Pass timeout when running speech functional tests by @chtruong814 :: PR: #15012
[Voice Agent] Fix text aggregation, eob handling, logging by @stevehuang52 :: PR: #14951
Fix speechlm inference configuration by @stevehuang52 :: PR: #14931
Enable EP in PTQ by @jenchen13 :: PR: #15015
revert ckpt scripts removal from #14617 by @dimapihtar :: PR: #15048
fix: fix update-buildcache workflow after ED remove by @pablo-garay :: PR: #15051
Update changelog for v2.5.3 by @github-actions[bot] :: PR: #15055
[voice agent] Fix RTVI missing bot message by @stevehuang52 :: PR: #15068
[voice agent] make parakeet-eou model default stt by @stevehuang52 :: PR: #15069
chore: Remove Automodel module by @thomasdhc :: PR: #15044
add support for parallel ckpt removal by @dimapihtar :: PR: #15073
Fix VLM mcore engine by @meatybobby :: PR: #15076
Revert "Fix vlm engine changes in mcore (#15076)" by @pablo-garay :: PR: #15090
fix: fix lines with malformed anchor tags by @pablo-garay :: PR: #15095
Update Gemma3VL model training scripts by @genquan9 :: PR: #15041
fix MR layer b2b filter to be comptatible with baseline FFTConv by @moradza :: PR: #15100
guard trust_remote_code by @dimapihtar :: PR: #15065
Fix get_new_ctm_lines_from_alignments function in scripts/speaker_tasks/create_alignment_manifest.py by @KunalDhawan :: PR: #15118
Change title to 'NVIDIA NeMo Speech Collection' by @snowmanwwg :: PR: #15127
remove pinning cuda bindings by @nithinraok :: PR: #15183
Update URL to ModelOpt Speculative by @AAnoosheh :: PR: #15075
remove ckpt_save_pre_mcore_014 support by @dimapihtar :: PR: #15146
Removing pip install instruction for NeMo voice agent environment setting by @tango4j :: PR: #15101
replace deprecated type comment with type annotation. by @XuesongYang :: PR: #15175
Set L2_NeMo_2_llama3_pretraining_recipe to be optional by @chtruong814 :: PR: #15192
Update README with latest news on nano-v3 by @snowmanwwg :: PR: #15197
[speechlm] replace pcikle.loads with json.loads by @stevehuang52 :: PR: #15232
[Fix] Fix safety issue for fsdp_dtensor by @BoxiangW :: PR: #15227
[voice agent] Add examples for tool calling by @stevehuang52 :: PR: #15243
Update README to reflect current Repo Status by @nithinraok :: PR: #15217
remove checks for hydra installation by @nithinraok :: PR: #15267
[voice agent] Improve tool calling and logging ux by @stevehuang52 :: PR: #15269
Implement Nemotron-VoiceChat Speech Decoder by @Edresson :: PR: #15066
Update CONTRIBUTING.md by @chtruong814 :: PR: #15260
Update changelog for r2.6.0 by @github-actions[bot] :: PR: #15282
set dynamo=False to support latest version of pytorch by @nithinraok :: PR: #15284
Fix progress_printer using wrong variable in on_test_batch_end by @yurekami :: PR: #15237
[voice agent] Add audio logging to NeMo Voice Agent by @tango4j :: PR: #15279
bump transformers version by @nithinraok :: PR: #15271
Use PurePosixPath for cross-platform path handling by @yurekami :: PR: #15238
Fix website link for clearml by @orena1 :: PR: #14128
Clipping, lowpass and lossy codec online augmentations for Lhotse dataloader/sampler by @racoiaws :: PR: #14809
ci: Enable label to force run CI tests by @chtruong814 :: PR: #15242
Remove Codeowners for now by @blisc :: PR: #15307
unset weights_only=False by @dimapihtar :: PR: #15312
update readme by @nithinraok :: PR: #15341
add safe globals to documentation by @dimapihtar :: PR: #15342
Security: Fix command injection and insecure permissions by @AkCodes23 :: PR: #15288
Freeze tags in in r2.7.0 by @github-actions[bot] :: PR: #15351
Update Imports in Audio Notebook (15345) into r2.7.0 by @blisc :: PR: #15352
cp: Clarify when to use TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD (15353) into r2.7.0 by @chtruong814 :: PR: #15359
cp: Fix macro accuracy when changing labels (15379) into r2.7.0 by @chtruong814 :: PR: #15386
cp: [voice agent] fix dependency for nemo26.02 (15380) into r2.7.0 by @chtruong814 :: PR: #15383
cp: Remove deprecated LLM, VLM, and diffusion tutorials (15357) into r2.7.0 by @chtruong814 :: PR: #15392
update yaml to include new location of losses that were removed in #15211 (r.2.7.0 fix) by @blisc :: PR: #15391
cp: fixes nemo tutorial for loading non registered classes (15398) into r2.7.0 by @chtruong814 :: PR: #15399
cp: default weights to false (15397) into r2.7.0 by @chtruong814 :: PR: #15401
cp: Fixing could not find ctc_segmentation. in CTC tutorial (15403) into r2.7.0 by @chtruong814 :: PR: #15404
cp: Adapt to use env variable for adapter mixin model loading (15406) into r2.7.0 by @chtruong814 :: PR: #15407
Fix BNR 2.0 inference alignment error with input signal padding by @ManasiRemane :: PR: #15390
chore: Remove pre-release suffix for 2.7.0 by @chtruong814 :: PR: #15415
cp: Update release workflow to include generated changelog (#15429) by @chtruong814 :: PR: #15430

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Neural Modules 2.7.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Detailed Changelogs:

ASR

TTS

NLP / NMT

Text Normalization / Inverse Text Normalization

NeMo Tools

Export

Bugfixes

Uncategorized:

Contributors

Uh oh!