[training_utils] refactor: Extend response slicing to handle multi-dimensional model outputs #4964

JacobHelwig · 2026-01-17T16:22:35Z

What does this PR do?

Moves _slice_response_from_unpad_output outside of verl.workers.utils.losses so that modules imported by verl.workers.utils.losses can import _slice_response_from_unpad_output without circular import
Extends _slice_response_from_unpad_output to multi-dimensional tensors (e.g., instead of log_probs of shape (S,), topk_log_probs of shape (S, K).

Both changes are used by #4897 for computing top-k distillation loss.

Design & Code Changes

Mainly a re-factor; extension to multi-dimensional tensors are these lines:

    # Skip padding dimensions after sequence dimensions, if any.
    skip_padding = (0, 0) * (values.ndim - 1)
    for resp_len, seq_offset in zip(response_lens, sequence_offsets, strict=True):
        pad_size = max_response_len - resp_len
        # left-shift model output by one token for log_probs/values
        response_list.append(F.pad(values[seq_offset - resp_len - 1 : seq_offset - 1], (*skip_padding, 0, pad_size)))

Additionally corrects computation of max_response_len:

verl/verl/workers/utils/losses.py

Lines 72 to 75 in 65eb5a1

    
           if prompt_ids.is_nested: 
        
               prompt_lens = prompt_ids.offsets().diff() 
        
               response_lens = response_ids.offsets().diff() 
        
               max_response_len = response_ids.offsets().max().item()

Also adds check on non-empty prompt:

assert not prompt_lens.eq(0).any(), f"seq_offset - resp_len - 1 assumes prompt_len > 0. Got {prompt_lens}"

gemini-code-assist

Code Review

This pull request refactors the _slice_response_from_unpad_output function by moving it to a different module to avoid circular dependencies and extends its functionality to handle multi-dimensional tensors. The extension for multi-dimensional tensors appears correct. However, a critical issue has been introduced: the module where the function is now used (verl/workers/utils/losses.py) is missing the necessary import, which will lead to a runtime error. I have also provided a suggestion to improve the accuracy of the function's docstring for better maintainability.

verl/workers/utils/losses.py

verl/workers/utils/padding.py

JacobHelwig · 2026-01-17T16:30:36Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors _slice_response_from_unpad_output by moving it to verl.workers.utils.padding to resolve a circular dependency and extends it to support multi-dimensional tensors. The changes are logical and well-structured for the intended purpose. However, I've identified a critical pre-existing bug in the moved function related to the calculation of max_response_len for nested tensors, which could lead to incorrect behavior and potential crashes. I've provided a suggestion to fix it.

gemini-code-assist · 2026-01-17T16:32:00Z

verl/workers/utils/padding.py

+    if prompt_ids.is_nested:
+        prompt_lens = prompt_ids.offsets().diff()
+        response_lens = response_ids.offsets().diff()
+        max_response_len = response_ids.offsets().max().item()


There's a critical bug in how max_response_len is calculated for nested tensors. response_ids.offsets().max().item() computes the total number of tokens in all responses in the batch, not the maximum length of a single response. This will lead to incorrect padding sizes, likely causing out-of-memory errors or producing tensors with incorrect shapes.

To fix this, you should calculate the maximum of the individual response lengths.

Suggested change

max_response_len = response_ids.offsets().max().item()

max_response_len = response_ids.offsets().diff().max().item()

JacobHelwig · 2026-01-17T18:22:43Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors _slice_response_from_unpad_output by moving it to a new file and extending its functionality to handle multi-dimensional tensors. While the refactoring and extension are well-motivated, I've identified a critical bug in the slicing logic that can lead to incorrect memory access when processing sequences with empty prompts. I have provided a detailed comment with a suggested fix to address this issue.

verl/workers/utils/padding.py

JacobHelwig · 2026-01-17T18:35:21Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the _slice_response_from_unpad_output function by moving it to a separate utility file to resolve circular dependencies and extends its functionality to support multi-dimensional model outputs. The changes are well-structured and include a necessary bug fix for calculating max_response_len and an assertion to enforce the assumption of non-empty prompts. My review includes one high-severity suggestion to handle empty batches gracefully, which would currently cause a runtime error.

gemini-code-assist · 2026-01-17T18:38:17Z

verl/workers/utils/padding.py

+    if prompt_ids.is_nested:
+        prompt_lens = prompt_ids.offsets().diff()
+        response_lens = response_ids.offsets().diff()
+        max_response_len = response_lens.max().item()
+    else:
+        assert not attention_mask.is_nested
+        prompt_lens = attention_mask[:, : prompt_ids.shape[1]].sum(dim=1)
+        response_lens = attention_mask[:, prompt_ids.shape[1] :].sum(dim=1)
+        max_response_len = response_ids.shape[1]


The function does not handle empty batches, which will cause a runtime error. If prompt_ids represents an empty batch, response_lens will be an empty tensor, and calling .max() on it will raise an exception. For non-nested tensors, an empty batch will lead to an IndexError on sequence_offsets later on.

To improve robustness and prevent crashes, it's important to handle this edge case. I suggest adding a check for empty batches at the start of this logic block and returning an appropriately shaped empty tensor.

Suggested change

if prompt_ids.is_nested:

prompt_lens = prompt_ids.offsets().diff()

response_lens = response_ids.offsets().diff()

max_response_len = response_lens.max().item()

else:

assert not attention_mask.is_nested

prompt_lens = attention_mask[:, : prompt_ids.shape[1]].sum(dim=1)

response_lens = attention_mask[:, prompt_ids.shape[1] :].sum(dim=1)

max_response_len = response_ids.shape[1]

if (prompt_ids.is_nested and not prompt_ids.numel()) or (not prompt_ids.is_nested and prompt_ids.shape[0] == 0):

assert values.numel() == 0, "Non-empty values with empty batch"

max_response_len = 0 if prompt_ids.is_nested else response_ids.shape[1]

return torch.empty(0, max_response_len, *values.shape[1:], device=values.device, dtype=values.dtype)

if prompt_ids.is_nested:

prompt_lens = prompt_ids.offsets().diff()

response_lens = response_ids.offsets().diff()

max_response_len = response_lens.max().item()

else:

assert not attention_mask.is_nested

prompt_lens = attention_mask[:, : prompt_ids.shape[1]].sum(dim=1)

response_lens = attention_mask[:, prompt_ids.shape[1] :].sum(dim=1)

max_response_len = response_ids.shape[1]

Move padding

40cfd16

gemini-code-assist bot reviewed Jan 17, 2026

View reviewed changes

verl/workers/utils/losses.py Show resolved Hide resolved

verl/workers/utils/padding.py Outdated Show resolved Hide resolved

Import and doc string

151ef9b

gemini-code-assist bot reviewed Jan 17, 2026

View reviewed changes

Max response len

794be33

gemini-code-assist bot reviewed Jan 17, 2026

View reviewed changes

verl/workers/utils/padding.py Show resolved Hide resolved

Prompt lens

355e1b8

gemini-code-assist bot reviewed Jan 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[training_utils] refactor: Extend response slicing to handle multi-dimensional model outputs #4964

[training_utils] refactor: Extend response slicing to handle multi-dimensional model outputs #4964

JacobHelwig commented Jan 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

JacobHelwig commented Jan 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Uh oh!

JacobHelwig commented Jan 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

JacobHelwig commented Jan 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if prompt_ids.is_nested:
	prompt_lens = prompt_ids.offsets().diff()
	response_lens = response_ids.offsets().diff()
	max_response_len = response_ids.offsets().max().item()

[training_utils] refactor: Extend response slicing to handle multi-dimensional model outputs #4964

Are you sure you want to change the base?

[training_utils] refactor: Extend response slicing to handle multi-dimensional model outputs #4964

Conversation

JacobHelwig commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Design & Code Changes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

JacobHelwig commented Jan 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

JacobHelwig commented Jan 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

JacobHelwig commented Jan 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JacobHelwig commented Jan 17, 2026 •

edited

Loading