Skip to content

Removes use of torchaudio and moves transforms inside of NeMo#15211

Merged
blisc merged 17 commits intoNVIDIA-NeMo:mainfrom
blisc:tts_2512_removetorchaudio
Jan 16, 2026
Merged

Removes use of torchaudio and moves transforms inside of NeMo#15211
blisc merged 17 commits intoNVIDIA-NeMo:mainfrom
blisc:tts_2512_removetorchaudio

Conversation

@blisc
Copy link
Collaborator

@blisc blisc commented Dec 19, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Removes use of torchaudio.transforms and moves transforms inside of NeMo.
NOTE: we will use torchsquirm in nemo/collections/audio/metrics/squim.py and nemo/collections/tts/models/magpietts_preference_optimization.py

Collection: audio, asr, tts

Changelog

  • Move frequently used torchaudio transform into NeMo

PR Type:

  • New Feature
  • Bugfix
  • Documentation

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Copy link
Member

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@chtruong814 / @ko3n1g review for docker related changes.

@blisc blisc merged commit 737f9d5 into NVIDIA-NeMo:main Jan 16, 2026
259 checks passed
@blisc blisc deleted the tts_2512_removetorchaudio branch January 16, 2026 15:35
AkCodes23 pushed a commit to AkCodes23/NeMo that referenced this pull request Jan 28, 2026
…-NeMo#15211)

* remove use of torchaudio.transforms; SQUIM todo

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* add renamed file

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* fix autorefactor errors

Signed-off-by: Jason <jasoli@nvidia.com>

* fix linting issues

Signed-off-by: Jason <jasoli@nvidia.com>

* remove unneeded imports inside of audio collection

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* remove torchaudio from more files

Signed-off-by: Jason <jasoli@nvidia.com>

* update tests

Signed-off-by: Jason <jasoli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: blisc <blisc@users.noreply.github.com>

* change audio codec TA call

Signed-off-by: Jason <jasoli@nvidia.com>

* update import statement in speechlm2

Signed-off-by: Jason <jasoli@nvidia.com>

---------

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Co-authored-by: blisc <blisc@users.noreply.github.com>
Signed-off-by: Akhil Varanasi <akhilvaranasi23@gmail.com>
blisc added a commit that referenced this pull request Feb 11, 2026
blisc added a commit that referenced this pull request Feb 11, 2026
blisc added a commit that referenced this pull request Feb 12, 2026
nemoramo pushed a commit to nemoramo/MoNeMo that referenced this pull request Feb 13, 2026
blisc added a commit that referenced this pull request Feb 13, 2026
@MahmoudAshraf97
Copy link
Contributor

This PR is a breaking change to older models, please take action before it makes it to the next release

RuntimeError: Error(s) in loading state_dict for EncDecCTCModelBPE:
Missing key(s) in state_dict: "preprocessor.featurizer.window", "preprocessor.featurizer.fb".
Unexpected key(s) in state_dict: "preprocessor.featurizer._mel_spec_extractor.spectrogram.window", "preprocessor.featurizer._mel_spec_extractor.mel_scale.fb".

@pzelasko
Copy link
Collaborator

Which models is it breaking / how old?

@MahmoudAshraf97
Copy link
Contributor

MahmoudAshraf97 commented Feb 24, 2026

I managed to reproduce it with a model trained using v1.23.0, I have another model that was trained using v2.2.1 that did not reproduce the issue, these are internal models that I cannot share but I'm happy to test models published on HF or prepare a minimum repro if needed

@MahmoudAshraf97
Copy link
Contributor

Further investigation shows that this is reproducible with any model that was trained with preprocessor.use_torchaudio=True regardless of the version used to train it

@pzelasko
Copy link
Collaborator

Torchaudio was removed as a dependency. Can you migrate all models to non torchaudio preprocessor?

@MahmoudAshraf97
Copy link
Contributor

I don't mind doing that, in fact I stopped using it a while ago, the problem arises when we try to load models that were trained using torchaudio in the preprocessor and that fails, the solution imo would be having a translation code to match the key names in the state dict during the model loading process or a script to convert old .nemo files that used torchaudio to a format that is accepted by the new versions (just modify the parameter names in the state dict)

@pzelasko
Copy link
Collaborator

@MahmoudAshraf97 see if this helps #15437

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants