WIP: SALM with NeMo Automodel integration for Nemotron Nano V3 LLM backbone#15447
WIP: SALM with NeMo Automodel integration for Nemotron Nano V3 LLM backbone#15447
Conversation
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…eechlm-yifan-mod-port
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…IDIA-NeMo/NeMo into speechlm2-with-nemo-automodel
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…automodel's utility Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…full LLM Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…converted models Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Implements NemotronNanoV3PromptFormatter (NAME="nemotron-nano-v3") using ChatML-style <|im_start|>/<|im_end|> template with encode_dialog override that handles: auto-insert empty system turn, history thinking truncation, <think></think> prepend for non-thinking assistant turns, and dynamic inference prefix (thinking on/off). Includes Lhotse Cut integration via registered_prompt_format_fn. Verified against HF apply_chat_template for nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 (both string and token match). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
|
Trying to decide if we should make SALM backward compatible with vanilla |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| import json |
Check notice
Code scanning / CodeQL
Unused import Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
In general, the correct way to fix an unused import in Python is to remove the import statement if the module is never referenced in the file. This reduces visual clutter, avoids implying unnecessary dependencies, and can slightly speed up module import time.
Here, the best fix is to delete the import json line in nemo/collections/common/data/lhotse/text_adapters.py (line 14 in the provided snippet), leaving the rest of the imports unchanged. No additional methods, definitions, or replacement imports are needed, since no code in the shown region uses json. This change preserves all existing functionality because it only removes an unused symbol.
| @@ -11,7 +11,6 @@ | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| import json | ||
| import logging | ||
| import math | ||
| import random |
| # for turn in turns: | ||
| # if turn["role"] == "user" or turn["role"] == "system": | ||
| # if "/think" in turn["slots"]["message"]: | ||
| # enable_thinking = True | ||
| # elif "/no_think" in turn["slots"]["message"]: | ||
| # enable_thinking = False |
Check notice
Code scanning / CodeQL
Commented-out code Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
In general, to fix commented-out code you either (a) reinstate it as active code because it is required, or (b) remove it (or convert it into concise explanatory comments) if the behavior is not in use. Here, the function already accepts an enable_thinking flag and the commented block redundantly recalculates it from the content of system/user turns; since this logic is disabled and the docstring describes enable_thinking as a parameter, the least disruptive fix is to remove the commented-out code while preserving the surrounding explanatory comments about step 1. Concretely, in nemo/collections/common/prompts/qwen.py, inside Qwen3PromptFormatter.encode_dialog, delete lines 99–105 that begin with # enable_thinking = True and the subsequent commented for turn in turns: loop. No new methods or imports are needed.
| @@ -96,13 +96,6 @@ | ||
|
|
||
| # 1) (Inference, Optional) Determine if thinking is enabled in user or system turns. | ||
| # If multiple turns have the tag, we will use the last one. | ||
| # enable_thinking = True # By default, it is enabled according to Qwen3 prompt format | ||
| # for turn in turns: | ||
| # if turn["role"] == "user" or turn["role"] == "system": | ||
| # if "/think" in turn["slots"]["message"]: | ||
| # enable_thinking = True | ||
| # elif "/no_think" in turn["slots"]["message"]: | ||
| # enable_thinking = False | ||
|
|
||
| # 2) (Training and Inference) Remove thinking content from previous turns. | ||
| for turn in turns[:-1]: |
| with loss_parallel(): | ||
| super().backward(*args, **kwargs) | ||
|
|
||
| def configure_gradient_clipping(self, optimizer, gradient_clip_val, gradient_clip_algorithm=None): |
Check notice
Code scanning / CodeQL
Explicit returns mixed with implicit (fall through) returns Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
General fix: Ensure that all code paths in configure_gradient_clipping return explicitly and consistently. Since this method is used for side effects, the simplest is to not return the result of super().configure_gradient_clipping(...) and instead always return None (or just return) at the end, after any side-effectful calls.
Concrete best fix for this file:
- In
nemo/collections/speechlm2/models/salm.py, inconfigure_gradient_clipping:- Change the
if not self._use_fsdp ...branch to callsuper().configure_gradient_clipping(...)but notreturnits value. - After the conditional logic (and after the
if params:block), add an explicitreturn None(orreturn) so that every path in the function returns explicitly.
- Change the
- This keeps behavior identical:
- In the “no FSDP / no clipping” branch, Lightning’s default
configure_gradient_clippingstill runs for its side effects. - In the FSDP branch,
_clip_grad_norm_implstill runs. - Callers now always get
None(which is what they effectively got before in practice).
- In the “no FSDP / no clipping” branch, Lightning’s default
No new imports, methods, or definitions are required.
| @@ -319,12 +319,14 @@ | ||
| ``(mesh_id, placements)`` and combines per-group norms as plain tensors. | ||
| """ | ||
| if not self._use_fsdp or gradient_clip_val is None or gradient_clip_val <= 0: | ||
| return super().configure_gradient_clipping(optimizer, gradient_clip_val, gradient_clip_algorithm) | ||
| super().configure_gradient_clipping(optimizer, gradient_clip_val, gradient_clip_algorithm) | ||
| return None | ||
| from nemo_automodel.components.training.utils import _clip_grad_norm_impl | ||
|
|
||
| params = [p for group in optimizer.param_groups for p in group["params"] if p.grad is not None] | ||
| if params: | ||
| _clip_grad_norm_impl(params, max_norm=gradient_clip_val) | ||
| return None | ||
|
|
||
| @torch.no_grad() | ||
| def generate( |
| import torch | ||
| from lhotse import CutSet, SupervisionSegment | ||
| from lhotse.testing.dummies import dummy_cut, dummy_recording | ||
| from omegaconf import DictConfig, OmegaConf |
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
To fix the problem, remove the unused symbol OmegaConf from the import statement so that only DictConfig is imported. This keeps the dependency on omegaconf minimal while preserving all existing functionality, since DictConfig is actively used and OmegaConf is not.
Concretely, in tests/collections/speechlm2/test_salm_automodel_lora.py, at the import block near the top of the file, change the line:
from omegaconf import DictConfig, OmegaConfto:
from omegaconf import DictConfigNo other code changes, methods, or additional imports are required.
| @@ -19,7 +19,7 @@ | ||
| import torch | ||
| from lhotse import CutSet, SupervisionSegment | ||
| from lhotse.testing.dummies import dummy_cut, dummy_recording | ||
| from omegaconf import DictConfig, OmegaConf | ||
| from omegaconf import DictConfig | ||
|
|
||
| from nemo.collections.common.data.lhotse import NeMoMultimodalConversation | ||
| from nemo.collections.common.data.lhotse.text_adapters import AudioTurn, TextTurn |
| from nemo.collections.speechlm2.parts.automodel_lora import ( | ||
| LORA_PARAM_PATTERN, | ||
| ensure_lora_trainable, | ||
| make_peft_config, | ||
| maybe_install_lora, | ||
| ) |
Check notice
Code scanning / CodeQL
Unused import Note test
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
To fix the problem, remove the unused name maybe_install_lora from the multi-name import, while keeping the other imported, used symbols intact. This avoids changing any runtime behavior, because the module will still be imported due to the remaining names, and only the unused symbol binding is removed from this file’s namespace.
Concretely, in tests/collections/speechlm2/test_salm_automodel_lora.py, locate the import block starting at line 30: from nemo.collections.speechlm2.parts.automodel_lora import (...). Edit the parenthesized list to drop the maybe_install_lora entry, leaving LORA_PARAM_PATTERN, ensure_lora_trainable, and make_peft_config unchanged and in place. No other code, imports, or logic need to be modified.
| @@ -31,7 +31,6 @@ | ||
| LORA_PARAM_PATTERN, | ||
| ensure_lora_trainable, | ||
| make_peft_config, | ||
| maybe_install_lora, | ||
| ) | ||
|
|
||
| if torch.cuda.is_available(): |
(copying my comment from Slack here) In the current PR, does it already work with HF Automodel and NeMo Automodel? If yes, it looks fine to me. Most of the complexity around model loading seems to be in configure_model() and some utility functions in pretrained.py . Other than that, the annoying thing is to have to put DTensor to a full tensor for some operations (I had to do the same for adding audio generation head), but I think it's not too bad. |
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information