Highlights
- Speech
- Adds Per-Stream Phrase Boosting in ASR Decoding (Transducers) #15125
- Adds support for streaming speech translation #15132
- Released new model nemotron-speech-streaming-en-0.6b that performs English Streaming ASR
- Released new TTS model magpie_tts_multilingual_357m for multilingual Text-to-Speech
Starting with the next release, NeMo 2.8.0, the following collections will be removed: avlm, diffusion, llm, multimodal, multimodal-autoregressive, nlp, speechlm, vision, vlm, and this repo will focus solely on speech tasks: ASR, TTS, speaker diarization, and speech enhancement.
Detailed Changelogs:
ASR
Changelog
- Enable CUDA graphs in streaming tests by @artbataev :: PR: #14953
- Update ctc-segmentation by @chtruong814 :: PR: #14991
- check asr models by @nithinraok :: PR: #14989
- Unified inference of streaming ASR by @naymaraq :: PR: #14817
- Update numba to numba-cuda and update cuda python bindings usage by @chtruong814 :: PR: #15018
- Fixing lines for multispeaker pipeline by @tango4j :: PR: #15030
- Inference optimization for cache-aware pipelines by @naymaraq :: PR: #15035
- fix loading of hyb ctc rnnt bpe models when using from pretrained by @nithinraok :: PR: #15042
- removed old buffered CTC script by @naymaraq :: PR: #15061
- remove nlp related notebooks by @nithinraok :: PR: #15070
- Update MagpieTTS model with latest changes by @blisc :: PR: #15031
- ASR inference: expose RNN-T decoding params for context biasing by @artbataev :: PR: #15091
- update notebook by @nithinraok :: PR: #15093
- Fix: Obsolete Attribute [SDE] by @Jorjeous :: PR: #15105
- Upgrade NeMo ASR tutorials from Mozilla/CommonVoice to Google/FLEURS by @KunalDhawan :: PR: #15103
- Add support for AIS batch loading for ASR audio processing by @gaikwadabhishek :: PR: #15102
- Multi-Talker Parakeet Streaming - NeMo Documents and Tutorial Notebooks PR 03 by @tango4j :: PR: #15025
- [Fix] Fix the notebook errors on multispeaker data simulation and end to end diarization training by @tango4j :: PR: #15149
- Streaming transducer inference: fix memory usage, improve WER by @artbataev :: PR: #15148
- Execute with subprocess list by @nithinraok :: PR: #15165
- Chunking fix by @nune-tadevosyan :: PR: #15163
- ASR Decoding: allow fallback to CUDA graphs without while loops by @artbataev :: PR: #15173
- remove nlp/modules by @dimapihtar :: PR: #14934
- Asr numpy 2 fix by @nithinraok :: PR: #15166
- Adding flexible input sources for Diarization Mixin by @tango4j :: PR: #15184
- Add support for streaming speech translation by @naymaraq :: PR: #15132
- Confidence fix get_correct_marks by @nune-tadevosyan :: PR: #15128
- Chunking edge cases by @nune-tadevosyan :: PR: #15182
- update subprocess cmd by @nithinraok :: PR: #15218
- Changes required for enabling prompt based models in Nemo Inference by @arushidNV :: PR: #15036
- Fixing the missing sample_rate argument in mixin calling in Sortformer model file by @tango4j :: PR: #15228
- Fix audio tensor loading canary2 by @nithinraok :: PR: #15265
- Fix word confidence return by @nithinraok :: PR: #15249
- feat(asr): add optional auxiliary timestamp model restoration for Canary by @chaosido :: PR: #15268
- Performance: Optimize .nemo tar extraction & model config processing by @paulirish :: PR: #15245
- fix speech commands notebook by @nithinraok :: PR: #15290
- fix timestamps processing with audio tensor input by @nithinraok :: PR: #15291
- Update conv_asr.py preventing unnecessary calculations by @tamilselvan0x0 :: PR: #15239
- Bump to pytorch 25.11 by @chtruong814 :: PR: #15247
- Add FeatureBuffer support to Cache-Aware streaming pipeline by @arushidNV :: PR: #15188
- Per-Stream Phrase Boosting in ASR Decoding (Transducers) by @artbataev :: PR: #15125
- Sort audio by duration in ASR streaming inference script by @artbataev :: PR: #15297
- ASR transcribe: fix forced decoder reinstantiation with
timestamps=Trueby @artbataev :: PR: #15298 - Removes use of torchaudio and moves transforms inside of NeMo by @blisc :: PR: #15211
- Add sacrebleu to ASR requirements by @pzelasko :: PR: #15016
- SpeechLM2 : Add support for offset key in Multimodal conversation by @AudranBert :: PR: #15281
- Add cross-attention to output hypotheses by @mgaido91 :: PR: #15229
- Add warm-ups for RTFX calculation in streaming ASR pipelines by @naymaraq :: PR: #15313
- Speedup buffered transducer inference: remove double decoding by @artbataev :: PR: #15301
- improve canary performance on short audio by @nithinraok :: PR: #15317
- Transducer Decoding: Move fusion models to the base class by @artbataev :: PR: #15322
- Add typing to speech_to_text_finetune.py by @Garvys :: PR: #15326
- Bugfix: correct fusion scores for TDT by @artbataev :: PR: #15325
- Fix ASR streaming script: correctly add biasing requests to model by @artbataev :: PR: #15334
- Fix ASR context biasing in streaming TDT decoding by @artbataev :: PR: #15327
TTS
Changelog
- Remove HeteronymClassificationModel by @blisc :: PR: #14980
- remove nlp.parts collection by @dimapihtar :: PR: #14617
- Update MagpieTTS model with latest changes by @blisc :: PR: #15031
- remove nlp/modules by @dimapihtar :: PR: #14934
- [TTS] MagpieTTS Inference Refactoring by @subhankar-ghosh :: PR: #15178
- [DRAFT][TTS] Magpietts Simple API and loading audiocodec from Huggingface by @subhankar-ghosh :: PR: #15172
- [TTS][MagpieTTS] Change French tokenizer to use 'french_chartokenizer' by @subhankar-ghosh :: PR: #15205
- Add Japanese g2p katakana accent support by @quapham :: PR: #15170
- [TTS][MagpieTTS] Longform TTS using MagpieTTS by @subhankar-ghosh :: PR: #15210
- [voice agent] Fixing the missing arguments calling in
NemoSTTServiceby @SangwonSUH :: PR: #15233 - [TTS] MagpieTTS inference: Add command line option to select a subset of datasets to run inference on by @rfejgin :: PR: #15212
- [TTS] Allow inference without reference audio by @rfejgin :: PR: #15213
- [TTS] Refactor Magpie to support codec conversion and bandwidth extension by @rlangman :: PR: #15191
- [TTS] MagpieTTS: Implement Frechet Codec Distance metric + some minor inference bugfixes by @rfejgin :: PR: #15223
- Update MagpieTTS' Inference Parameter Configuration by @blisc :: PR: #15254
- [TTS][MagpieTTS] Add longform capability to do_tts method by @subhankar-ghosh :: PR: #15241
- [TTS] Add tests of the MagpieTTS inference CLI by @rfejgin :: PR: #15272
- [MagpieTTS][TTS] Support local transformer in longform magpietts by @subhankar-ghosh :: PR: #15296
- Removes use of torchaudio and moves transforms inside of NeMo by @blisc :: PR: #15211
- [MagpieTTS][Docs] Add magpietts docs by @subhankar-ghosh :: PR: #15302
- Add Hindi (hi-IN) support for TTS by @quapham :: PR: #15248
- build: Explicitly set torch >= 2.6.0 and remove weights_only=False by @chtruong814 :: PR: #15314
- [MagpieTTS] Fix incorrect sort order comment in pareto_rank function by @matteolippi :: PR: #15333
NLP / NMT
Changelog
- remove nlp.parts collection by @dimapihtar :: PR: #14617
- chore: remove ExportDeploy by @pablo-garay :: PR: #15033
- remove nlp related notebooks by @nithinraok :: PR: #15070
- Add deprecation notice to modules by @chtruong814 :: PR: #15050
- [OMNIML-3034] ModelOpt rename from TRT ModelOpt to ModelOpt by @yueshen2016 :: PR: #15147
- remove nlp/modules by @dimapihtar :: PR: #14934
- Add support for streaming speech translation by @naymaraq :: PR: #15132
- Remove hardcoded DEBUG logging level in gpt_oss.py by @yurekami :: PR: #15236
- Docs: replace removed preprocess_data_for_megatron.py with Megatron-L… by @Saibabu7770 :: PR: #15222
- remove nlp documentation by @dimapihtar :: PR: #15304
- fix speech translation vllm dockerfile by @naymaraq :: PR: #15310
Text Normalization / Inverse Text Normalization
Changelog
- Add import guards for mcore lightning module by @chtruong814 :: PR: #14970
- chore: update Lightning requirements version by @liquor233 :: PR: #15004
NeMo Tools
Changelog
Export
Changelog
- chore: remove ExportDeploy by @pablo-garay :: PR: #15033
- [OMNIML-3034] ModelOpt rename from TRT ModelOpt to ModelOpt by @yueshen2016 :: PR: #15147
- fix: Raise exception in nemo.export instead of allowing pickle.loads by @chtruong814 :: PR: #15266
Bugfixes
Changelog
- Fix PEFT resume with
resume_from_pathby @maanug-nv :: PR: #14966 - Update deprecated env var by @maanug-nv :: PR: #14975
- Revert lhotse patch after updating to lhotse 1.32.2 by @chtruong814 :: PR: #15329
Uncategorized:
Changelog
- Version bump to
2.7.0rc0.dev0by @github-actions[bot] :: PR: #14956 - Update changelog for
v2.5.1by @github-actions[bot] :: PR: #14967 - Bump MCore, TE, Pytorch, and modelopt for 25.11 by @chtruong814 :: PR: #14946
- Remove code related to nemo-evaluator (aka nemo-eval) by @athitten :: PR: #14964
- Update changelog for
r2.5.0by @github-actions[bot] :: PR: #14990 - Add clear resharding message error message by @mikolajblaz :: PR: #14962
- Fix Evo2 checkpoint backward compatibility by @farhadrgh :: PR: #14914
- Pass timeout when running speech functional tests by @chtruong814 :: PR: #15012
- [Voice Agent] Fix text aggregation, eob handling, logging by @stevehuang52 :: PR: #14951
- Fix speechlm inference configuration by @stevehuang52 :: PR: #14931
- Enable EP in PTQ by @jenchen13 :: PR: #15015
- revert ckpt scripts removal from #14617 by @dimapihtar :: PR: #15048
- fix: fix update-buildcache workflow after ED remove by @pablo-garay :: PR: #15051
- Update changelog for
v2.5.3by @github-actions[bot] :: PR: #15055 - [voice agent] Fix RTVI missing bot message by @stevehuang52 :: PR: #15068
- [voice agent] make parakeet-eou model default stt by @stevehuang52 :: PR: #15069
- chore: Remove Automodel module by @thomasdhc :: PR: #15044
- add support for parallel ckpt removal by @dimapihtar :: PR: #15073
- Fix VLM mcore engine by @meatybobby :: PR: #15076
- Revert "Fix vlm engine changes in mcore (#15076)" by @pablo-garay :: PR: #15090
- fix: fix lines with malformed anchor tags by @pablo-garay :: PR: #15095
- Update Gemma3VL model training scripts by @genquan9 :: PR: #15041
- fix MR layer b2b filter to be comptatible with baseline FFTConv by @moradza :: PR: #15100
- guard trust_remote_code by @dimapihtar :: PR: #15065
- Fix get_new_ctm_lines_from_alignments function in scripts/speaker_tasks/create_alignment_manifest.py by @KunalDhawan :: PR: #15118
- Change title to 'NVIDIA NeMo Speech Collection' by @snowmanwwg :: PR: #15127
- remove pinning cuda bindings by @nithinraok :: PR: #15183
- Update URL to ModelOpt Speculative by @AAnoosheh :: PR: #15075
- remove ckpt_save_pre_mcore_014 support by @dimapihtar :: PR: #15146
- Removing pip install instruction for NeMo voice agent environment setting by @tango4j :: PR: #15101
- replace deprecated type comment with type annotation. by @XuesongYang :: PR: #15175
- Set L2_NeMo_2_llama3_pretraining_recipe to be optional by @chtruong814 :: PR: #15192
- Update README with latest news on nano-v3 by @snowmanwwg :: PR: #15197
- [speechlm] replace pcikle.loads with json.loads by @stevehuang52 :: PR: #15232
- [Fix] Fix safety issue for fsdp_dtensor by @BoxiangW :: PR: #15227
- [voice agent] Add examples for tool calling by @stevehuang52 :: PR: #15243
- Update README to reflect current Repo Status by @nithinraok :: PR: #15217
- remove checks for hydra installation by @nithinraok :: PR: #15267
- [voice agent] Improve tool calling and logging ux by @stevehuang52 :: PR: #15269
- Implement Nemotron-VoiceChat Speech Decoder by @Edresson :: PR: #15066
- Update CONTRIBUTING.md by @chtruong814 :: PR: #15260
- Update changelog for
r2.6.0by @github-actions[bot] :: PR: #15282 - set dynamo=False to support latest version of pytorch by @nithinraok :: PR: #15284
- Fix progress_printer using wrong variable in on_test_batch_end by @yurekami :: PR: #15237
- [voice agent] Add audio logging to NeMo Voice Agent by @tango4j :: PR: #15279
- bump transformers version by @nithinraok :: PR: #15271
- Use PurePosixPath for cross-platform path handling by @yurekami :: PR: #15238
- Fix website link for clearml by @orena1 :: PR: #14128
- Clipping, lowpass and lossy codec online augmentations for Lhotse dataloader/sampler by @racoiaws :: PR: #14809
- ci: Enable label to force run CI tests by @chtruong814 :: PR: #15242
- Remove Codeowners for now by @blisc :: PR: #15307
- unset weights_only=False by @dimapihtar :: PR: #15312
- update readme by @nithinraok :: PR: #15341
- add safe globals to documentation by @dimapihtar :: PR: #15342
- Security: Fix command injection and insecure permissions by @AkCodes23 :: PR: #15288
- Freeze tags in in
r2.7.0by @github-actions[bot] :: PR: #15351 - Update Imports in Audio Notebook (15345) into r2.7.0 by @blisc :: PR: #15352
- cp:
Clarify when to use TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD (15353)intor2.7.0by @chtruong814 :: PR: #15359 - cp:
Fix macro accuracy when changing labels (15379)intor2.7.0by @chtruong814 :: PR: #15386 - cp:
[voice agent] fix dependency for nemo26.02 (15380)intor2.7.0by @chtruong814 :: PR: #15383 - cp:
Remove deprecated LLM, VLM, and diffusion tutorials (15357)intor2.7.0by @chtruong814 :: PR: #15392 - update yaml to include new location of losses that were removed in #15211 (r.2.7.0 fix) by @blisc :: PR: #15391
- cp:
fixes nemo tutorial for loading non registered classes (15398)intor2.7.0by @chtruong814 :: PR: #15399 - cp:
default weights to false (15397)intor2.7.0by @chtruong814 :: PR: #15401 - cp:
Fixing could not find ctc_segmentation. in CTC tutorial (15403)intor2.7.0by @chtruong814 :: PR: #15404 - cp:
Adapt to use env variable for adapter mixin model loading (15406)intor2.7.0by @chtruong814 :: PR: #15407 - Fix BNR 2.0 inference alignment error with input signal padding by @ManasiRemane :: PR: #15390
- chore: Remove pre-release suffix for 2.7.0 by @chtruong814 :: PR: #15415
- cp: Update release workflow to include generated changelog (#15429) by @chtruong814 :: PR: #15430