WIP: SALM with NeMo Automodel integration for Nemotron Nano V3 LLM backbone#15447

Draft

pzelasko wants to merge 45 commits intomainfrom

speechlm2-with-nemo-automodel-merge

Collaborator

pzelasko commented Feb 26, 2026

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

pzelasko and others added 30 commits

February 4, 2026 14:17


          WIP: bringing Yifan's changes to main

c2395e0

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>


          Add workaround for exp_manager issue

8d4c570

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>


          Support reading indexed JSONL datasets with ShareGPT format

ff54b12

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Merge remote-tracking branch 'origin/speechlm-yifan-mod-port' into sp…

b9e7b23

…eechlm-yifan-mod-port


          Support reading indexed tarred datasets with ShareGPT format

4a6324d

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Refactor for compactness

9a2f78a

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fixes for real-life data

e222048

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fixes for real-life data

c538a45

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fixes for real-life data

9fc4b72

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fixes for missing wids-meta.json

4b4c529

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fixes for tarfile edge cases

fc3dffb

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fixes for real-world tar files

c45ea47

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          move salm llm init to configure_model

c80ed96

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          fix: delayed perception init

794d300

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>


          Add AutomodelParallelStrategy for Automodel LLM support

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Merge branch 'speechlm2-with-nemo-automodel' of https://github.com/NV…

c6c818c

…IDIA-NeMo/NeMo into speechlm2-with-nemo-automodel


          Replace HF Automodel with NeMo Automodel for SALM's LLM backbone

024c8d0

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Update salm default config with new options

c4a2a3b

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Init fixes

162117f

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fix dtype initialization

43e1bb1

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fix mesh selection for speech encoder

20a2824

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fix for mismatched device_mesh axis names in gradient clipping - use …

cd6ddf3

…automodel's utility

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fix for using embed_tokens in FSDP context before running forward on …

ff4beab

…full LLM

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Definitive fix for using embed_tokens outside of llm with fsdp

b3658b1

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          this version actually works with Automodel

71b6744

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>


          fix from_pretrained with transformers v5

a5d33d2

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          fix from_pretrained with transformers v5

1d9ed29

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          fix generate/eval

aaf828a

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          fix to_hf

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fixes for AutoTokenizer decoding in v5

f4bb443

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

pzelasko and others added 15 commits

February 18, 2026 11:56


          Flag to run configure_model() at the end of __init__ for safetensors …

4c21c4d

…converted models

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          preliminary: support distributed models in to_hf.py

e09418c

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          fix passing automodel kwargs

a54828c

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

fix

cf40405

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Enable inference with model parallelism

2b7f9d0

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fix for lightning save_hyperparameters() call

05c69b8

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fix for loading into DTensor

cf0b97f

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Accelerate loading DTensor

5c84827

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Accelerate loading DTensor

d595b7b

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Accelerate loading DTensor

b4ec5d2

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Fix for pe buffers not in ckpt (essentially strict=False)

b6c8725

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Add Nemotron Nano v3 prompt formatter with <think> reasoning support

823c4ab

Implements NemotronNanoV3PromptFormatter (NAME="nemotron-nano-v3") using
ChatML-style <|im_start|>/<|im_end|> template with encode_dialog override
that handles: auto-insert empty system turn, history thinking truncation,
<think></think> prepend for non-thinking assistant turns, and dynamic
inference prefix (thinking on/off). Includes Lhotse Cut integration via
registered_prompt_format_fn. Verified against HF apply_chat_template for
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 (both string and token match).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix

b241c67

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Automodel LoRA support

80ff976

Signed-off-by: Piotr Żelasko <petezor@gmail.com>


          Merge branch 'main' into speechlm2-with-nemo-automodel-merge

42678cf

github-actions bot added the common label

pzelasko requested a review from zhehuaichen

February 26, 2026 21:02

Collaborator Author

pzelasko commented Feb 26, 2026

Trying to decide if we should make SALM backward compatible with vanilla transformers LLMs (shares lot of logic but gets somewhat complex) or copy this into a new class (cleaner but more duplication). In any case canary-qwen-2.5b released checkpoint must work with the final shape of this PR.

github-advanced-security bot found potential problems

View reviewed changes

nemo/collections/common/data/lhotse/text_adapters.py

               # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
               # See the License for the specific language governing permissions and
               # limitations under the License.
+              import json

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'json' is not used.

Copilot Autofix

AI 1 day ago

In general, the correct way to fix an unused import in Python is to remove the import statement if the module is never referenced in the file. This reduces visual clutter, avoids implying unnecessary dependencies, and can slightly speed up module import time.

Here, the best fix is to delete the import json line in nemo/collections/common/data/lhotse/text_adapters.py (line 14 in the provided snippet), leaving the rest of the imports unchanged. No additional methods, definitions, or replacement imports are needed, since no code in the shown region uses json. This change preserves all existing functionality because it only removes an unused symbol.

Suggested changeset 1

nemo/collections/common/data/lhotse/text_adapters.py

@@ -11,7 +11,6 @@
             # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
             # See the License for the specific language governing permissions and
             # limitations under the License.
-            import json
             import logging
             import math
             import random

Copilot is powered by AI and may make mistakes. Always verify output.

nemo/collections/common/prompts/qwen.py

Comment on lines +100 to +105

+                      # for turn in turns:
+                      #     if turn["role"] == "user" or turn["role"] == "system":
+                      #         if "/think" in turn["slots"]["message"]:
+                      #             enable_thinking = True
+                      #         elif "/no_think" in turn["slots"]["message"]:
+                      #             enable_thinking = False

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.

Copilot Autofix

AI 1 day ago

In general, to fix commented-out code you either (a) reinstate it as active code because it is required, or (b) remove it (or convert it into concise explanatory comments) if the behavior is not in use. Here, the function already accepts an enable_thinking flag and the commented block redundantly recalculates it from the content of system/user turns; since this logic is disabled and the docstring describes enable_thinking as a parameter, the least disruptive fix is to remove the commented-out code while preserving the surrounding explanatory comments about step 1. Concretely, in nemo/collections/common/prompts/qwen.py, inside Qwen3PromptFormatter.encode_dialog, delete lines 99–105 that begin with # enable_thinking = True and the subsequent commented for turn in turns: loop. No new methods or imports are needed.

Suggested changeset 1

nemo/collections/common/prompts/qwen.py

@@ -96,13 +96,6 @@
                     # 1) (Inference, Optional) Determine if thinking is enabled in user or system turns.
                     # If multiple turns have the tag, we will use the last one.
-                    # enable_thinking = True  # By default, it is enabled according to Qwen3 prompt format
-                    # for turn in turns:
-                    #     if turn["role"] == "user" or turn["role"] == "system":
-                    #         if "/think" in turn["slots"]["message"]:
-                    #             enable_thinking = True
-                    #         elif "/no_think" in turn["slots"]["message"]:
-                    #             enable_thinking = False
                     # 2) (Training and Inference) Remove thinking content from previous turns.
                     for turn in turns[:-1]:

Copilot is powered by AI and may make mistakes. Always verify output.

nemo/collections/speechlm2/models/salm.py

                       with loss_parallel():
                           super().backward(*args, **kwargs)
+                  def configure_gradient_clipping(self, optimizer, gradient_clip_val, gradient_clip_algorithm=None):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.

Copilot Autofix

AI 1 day ago

General fix: Ensure that all code paths in configure_gradient_clipping return explicitly and consistently. Since this method is used for side effects, the simplest is to not return the result of super().configure_gradient_clipping(...) and instead always return None (or just return) at the end, after any side-effectful calls.

Concrete best fix for this file:

In nemo/collections/speechlm2/models/salm.py, in configure_gradient_clipping:
- Change the if not self._use_fsdp ... branch to call super().configure_gradient_clipping(...) but not return its value.
- After the conditional logic (and after the if params: block), add an explicit return None (or return) so that every path in the function returns explicitly.
This keeps behavior identical:
- In the “no FSDP / no clipping” branch, Lightning’s default configure_gradient_clipping still runs for its side effects.
- In the FSDP branch, _clip_grad_norm_impl still runs.
- Callers now always get None (which is what they effectively got before in practice).

No new imports, methods, or definitions are required.

Suggested changeset 1

nemo/collections/speechlm2/models/salm.py

@@ -319,12 +319,14 @@
                     ``(mesh_id, placements)`` and combines per-group norms as plain tensors.
                     """
                     if not self._use_fsdp or gradient_clip_val is None or gradient_clip_val <= 0:
-                        return super().configure_gradient_clipping(optimizer, gradient_clip_val, gradient_clip_algorithm)
+                        super().configure_gradient_clipping(optimizer, gradient_clip_val, gradient_clip_algorithm)
+                        return None
                     from nemo_automodel.components.training.utils import _clip_grad_norm_impl
                     params = [p for group in optimizer.param_groups for p in group["params"] if p.grad is not None]
                     if params:
                         _clip_grad_norm_impl(params, max_norm=gradient_clip_val)
+                    return None
                 @torch.no_grad()
                 def generate(

Copilot is powered by AI and may make mistakes. Always verify output.

tests/collections/speechlm2/test_salm_automodel_lora.py

+              import torch
+              from lhotse import CutSet, SupervisionSegment
+              from lhotse.testing.dummies import dummy_cut, dummy_recording
+              from omegaconf import DictConfig, OmegaConf

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'OmegaConf' is not used.

Copilot Autofix

AI 1 day ago

To fix the problem, remove the unused symbol OmegaConf from the import statement so that only DictConfig is imported. This keeps the dependency on omegaconf minimal while preserving all existing functionality, since DictConfig is actively used and OmegaConf is not.

Concretely, in tests/collections/speechlm2/test_salm_automodel_lora.py, at the import block near the top of the file, change the line:

from omegaconf import DictConfig, OmegaConf

to:

from omegaconf import DictConfig

No other code changes, methods, or additional imports are required.

Suggested changeset 1

tests/collections/speechlm2/test_salm_automodel_lora.py

@@ -19,7 +19,7 @@
             import torch
             from lhotse import CutSet, SupervisionSegment
             from lhotse.testing.dummies import dummy_cut, dummy_recording
-            from omegaconf import DictConfig, OmegaConf
+            from omegaconf import DictConfig
             from nemo.collections.common.data.lhotse import NeMoMultimodalConversation
             from nemo.collections.common.data.lhotse.text_adapters import AudioTurn, TextTurn

Copilot is powered by AI and may make mistakes. Always verify output.

tests/collections/speechlm2/test_salm_automodel_lora.py

Comment on lines +30 to +35

+              from nemo.collections.speechlm2.parts.automodel_lora import (
+                  LORA_PARAM_PATTERN,
+                  ensure_lora_trainable,
+                  make_peft_config,
+                  maybe_install_lora,
+              )

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'maybe_install_lora' is not used.

Copilot Autofix

AI 1 day ago

To fix the problem, remove the unused name maybe_install_lora from the multi-name import, while keeping the other imported, used symbols intact. This avoids changing any runtime behavior, because the module will still be imported due to the remaining names, and only the unused symbol binding is removed from this file’s namespace.

Concretely, in tests/collections/speechlm2/test_salm_automodel_lora.py, locate the import block starting at line 30: from nemo.collections.speechlm2.parts.automodel_lora import (...). Edit the parenthesized list to drop the maybe_install_lora entry, leaving LORA_PARAM_PATTERN, ensure_lora_trainable, and make_peft_config unchanged and in place. No other code, imports, or logic need to be modified.

Suggested changeset 1

tests/collections/speechlm2/test_salm_automodel_lora.py

@@ -31,7 +31,6 @@
                 LORA_PARAM_PATTERN,
                 ensure_lora_trainable,
                 make_peft_config,
-                maybe_install_lora,
             )
             if torch.cuda.is_available():

Copilot is powered by AI and may make mistakes. Always verify output.

desh2608 commented Feb 26, 2026

Trying to decide if we should make SALM backward compatible with vanilla transformers LLMs (shares lot of logic but gets somewhat complex) or copy this into a new class (cleaner but more duplication). In any case canary-qwen-2.5b released checkpoint must work with the final shape of this PR.

(copying my comment from Slack here) In the current PR, does it already work with HF Automodel and NeMo Automodel? If yes, it looks fine to me. Most of the complexity around model loading seems to be in configure_model() and some utility functions in pretrained.py . Other than that, the annoying thing is to have to put DTensor to a full tensor for some operations (I had to do the same for adding audio generation head), but I think it's not too bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels