[rollout, vllm, sglang] fix: forward max_tokens/max_new_tokens from rollout config to vllm/sglang backends #5028

psyloy · 2026-01-23T04:19:03Z

What does this PR do?

In the current codebase, when max_tokens is not specified in sampling_params, its default value is dynamically calculated based on prompt_ids . This approach can lead to the generation of text exceeding the intended length, resulting in unnecessary computation, resource waste, and reduced inference performance. By configuring max_tokens in the configuration file, a fixed upper bound can be set to prevent uncontrolled text generation, ensuring predictable resource utilization and inference efficiency.

verl/workers/rollout/vllm_rollout/vllm_async_server.py +505

# Determine max_tokens from sampling_params or use configured response_length as default
if "max_tokens" in sampling_params:
    max_tokens = sampling_params.pop("max_tokens")
elif "max_new_tokens" in sampling_params:
    # support sglang-style 'max_new_tokens' param
    max_tokens = sampling_params.pop("max_new_tokens")
else:
    # Default to a calculation that considers configured lengths
    max_tokens = self.config.response_length + self.config.prompt_length - len(prompt_ids)

However, the max_tokens/max_new_tokens parameter is not passed into sampling_params in agent_loop.py. This leads to max_tokens in vllm_async_server.py only taking the default dynamically calculated value, which does not meet expectations in certain scenarios.

verl/experimental/agent_loop/agent_loop.py +428

config = self.config.actor_rollout_ref.rollout
sampling_params = dict(
    temperature=config.temperature,
    top_p=config.top_p,
    top_k=config.top_k,
    repetition_penalty=1.0,
    logprobs=config.calculate_log_probs,
)

Therefore, we have added the two parameters max_tokens/max_new_tokens to the rollout configuration, setting their default value as null. When these parameters are configured in the runtime script file, the corresponding values in vllm_async_server.py/sglang_async_server.py will be updated accordingly. If the parameters are not configured in the runtime script file, the original default logic will be followed, thereby avoiding any impact on existing functionality.

verl/experimental/agent_loop/agent_loop.py

for param_name in ["max_tokens", "max_new_tokens"]:
    param_value = getattr(config, param_name, None)
    if param_value is not None:
        sampling_params[param_name] = param_value
)

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: [feat: ensure max_new_tokens is set correctly in sampling_params

API and Usage Example

 actor_rollout_ref.rollout.max_tokens=${max_response_length}

gemini-code-assist

Code Review

This pull request correctly forwards max_tokens and max_new_tokens from the rollout configuration to the vLLM and SGLang backends, allowing for better control over generation length. The changes are applied consistently across agent_loop.py and its fully_async_policy counterpart.

I've identified a potential for configuration ambiguity if a user sets both max_tokens and max_new_tokens. The effective number of tokens would then depend on the backend being used, which could be confusing. I've suggested adding a warning to alert users to this situation, which will improve the robustness of the configuration.

gemini-code-assist · 2026-01-23T04:21:07Z

verl/experimental/agent_loop/agent_loop.py

+        for param_name in ["max_tokens", "max_new_tokens"]:
+            param_value = getattr(config, param_name, None)
+            if param_value is not None:
+                sampling_params[param_name] = param_value


If both max_tokens and max_new_tokens are set in the configuration, the behavior becomes dependent on the backend being used (vllm or sglang), as they have different parameter preferences. This can lead to unexpected generation lengths and hard-to-debug issues when switching backends. Adding a warning will alert users to this potential configuration issue, improving robustness.

Suggested change

for param_name in ["max_tokens", "max_new_tokens"]:

param_value = getattr(config, param_name, None)

if param_value is not None:

sampling_params[param_name] = param_value

params_to_check = ["max_tokens", "max_new_tokens"]

found_params = []

for param_name in params_to_check:

param_value = getattr(config, param_name, None)

if param_value is not None:

sampling_params[param_name] = param_value

found_params.append(param_name)

if len(found_params) > 1:

logger.warning(

"Both 'max_tokens' and 'max_new_tokens' are set in the rollout config. "

"The effective value will depend on the backend ('vllm' prefers 'max_tokens', "

"'sglang' prefers 'max_new_tokens'). It is recommended to set only one."

)

gemini-code-assist · 2026-01-23T04:21:07Z

verl/experimental/fully_async_policy/agent_loop/agent_loop.py

+        for param_name in ["max_tokens", "max_new_tokens"]:
+            param_value = getattr(config, param_name, None)
+            if param_value is not None:
+                sampling_params[param_name] = param_value


If both max_tokens and max_new_tokens are set in the configuration, the behavior becomes dependent on the backend being used (vllm or sglang), as they have different parameter preferences. This can lead to unexpected generation lengths and hard-to-debug issues when switching backends. Adding a warning will alert users to this potential configuration issue, improving robustness.

Suggested change

for param_name in ["max_tokens", "max_new_tokens"]:

param_value = getattr(config, param_name, None)

if param_value is not None:

sampling_params[param_name] = param_value

params_to_check = ["max_tokens", "max_new_tokens"]

found_params = []

for param_name in params_to_check:

param_value = getattr(config, param_name, None)

if param_value is not None:

sampling_params[param_name] = param_value

found_params.append(param_name)

if len(found_params) > 1:

logger.warning(

"Both 'max_tokens' and 'max_new_tokens' are set in the rollout config. "

"The effective value will depend on the backend ('vllm' prefers 'max_tokens', "

"'sglang' prefers 'max_new_tokens'). It is recommended to set only one."

)

psyloy requested review from PeterSH6, chenhaiq, eric-haibin-lin, tongyx361, vermouth1992 and zhaochenyang20 as code owners January 23, 2026 04:19

psyloy changed the title ~~configure max generation tokens for vllm/sglang~~ [rollout, vllm, sglang] fix: forward max_tokens/max_new_tokens from rollout config to vllm/sglang backends Jan 23, 2026

gemini-code-assist bot reviewed Jan 23, 2026

View reviewed changes

configure max generation tokens for vllm/sglang

275a8cd

psyloy force-pushed the main_1 branch from 57a8384 to 275a8cd Compare January 23, 2026 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rollout, vllm, sglang] fix: forward max_tokens/max_new_tokens from rollout config to vllm/sglang backends #5028

[rollout, vllm, sglang] fix: forward max_tokens/max_new_tokens from rollout config to vllm/sglang backends #5028

Uh oh!

psyloy commented Jan 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-        for param_name in ["max_tokens", "max_new_tokens"]:
-            param_value = getattr(config, param_name, None)
-            if param_value is not None:
-                sampling_params[param_name] = param_value
+        params_to_check = ["max_tokens", "max_new_tokens"]
+        found_params = []
+        for param_name in params_to_check:
+            param_value = getattr(config, param_name, None)
+            if param_value is not None:
+                sampling_params[param_name] = param_value
+                found_params.append(param_name)
+        if len(found_params) > 1:
+            logger.warning(
+                "Both 'max_tokens' and 'max_new_tokens' are set in the rollout config. "
+                "The effective value will depend on the backend ('vllm' prefers 'max_tokens', "
+                "'sglang' prefers 'max_new_tokens'). It is recommended to set only one."
+            )

[rollout, vllm, sglang] fix: forward max_tokens/max_new_tokens from rollout config to vllm/sglang backends #5028

Are you sure you want to change the base?

[rollout, vllm, sglang] fix: forward max_tokens/max_new_tokens from rollout config to vllm/sglang backends #5028

Uh oh!

Conversation

psyloy commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

API and Usage Example

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

psyloy commented Jan 23, 2026 •

edited

Loading