Skip to content

[BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config from rollout#5418

Open
wuxibin89 wants to merge 11 commits intoverl-project:mainfrom
wuxibin89:wuxibin/refactor_config
Open

[BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config from rollout#5418
wuxibin89 wants to merge 11 commits intoverl-project:mainfrom
wuxibin89:wuxibin/refactor_config

Conversation

@wuxibin89
Copy link
Collaborator

What does this PR do?

#5400 (comment), get rid of actor_rollout_ref from rollout:

  • AgentLoopManager
  • AgentLoopWorker
  • AgentLoopBase: SingleTurnAgentLoop, ToolAgentLoop

Eventually, we should get rid of actor_rollout_ref in verl/trainer/config/ppo_trainer.yaml with flattend actor, ref, rollout fields:

actor:
  _target_: verl.workers.config.FSDPActorConfig
  ...

ref:
  _target_: verl.workers.config.FSDPActorConfig
 ...

rollout:
  _target_: verl.workers.config.RolloutConfig
  ...

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly refactors the system to decouple rollout configurations from the trainer, enhancing modularity. Key improvements include a backward-compatibility layer for configuration, asynchronous refactoring of AgentLoopManager using a create factory pattern, and a robust auto_await decorator. From a security standpoint, no high-severity or critical vulnerabilities were identified, and existing security boundaries are maintained. However, critical issues were found where command-line configurations for rollout resources are incorrectly overwritten by default values, which could break distributed execution. Additionally, a type hint error requires correction for improved code clarity and correctness.

@wuxibin89
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the configuration handling to remove the nested actor_rollout_ref structure, moving towards a flatter configuration. It also introduces significant changes to make the AgentLoopManager and its related components fully asynchronous, using an async create factory pattern and an @auto_await decorator. While the refactoring is a positive step towards cleaner configuration and async-first design, there are a few high-risk areas. The new @auto_await decorator uses stack inspection (inspect.currentframe()), which is fragile and can cause subtle issues. The AgentLoopManager's public __init__ method now leaves the object in a partially initialized state, creating an unsafe API. Finally, there are temporary hacks in the main entry points that manually copy configuration values, introducing technical debt and risk of misconfiguration. These issues should be addressed to ensure the stability and maintainability of the new design.

Comment on lines +121 to +137
# Case 1: No running loop -> run with asyncio.run()
if loop is None:
return asyncio.run(coro)

# Case 2: Running loop -> return coro if caller will await
caller_frame = inspect.currentframe()
if caller_frame is not None:
caller_frame = caller_frame.f_back
caller_is_async = caller_frame is not None and (caller_frame.f_code.co_flags & inspect.CO_COROUTINE) != 0
if caller_is_async:
return coro

# Case 3: Running loop -> run coro in thread pool
# (cannot block the loop thread without deadlock)
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
future = pool.submit(asyncio.run, coro)
return future.result()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of inspect.currentframe() to determine the calling context is fragile and can lead to subtle bugs. This approach might not work correctly with other Python implementations (e.g., PyPy, Jython) or if the call stack is modified by other decorators or libraries. It also adds a performance overhead. Relying on stack inspection makes the code less predictable and harder to debug. Consider a more explicit API, such as providing separate synchronous and asynchronous methods (e.g., generate_sequences and generate_sequences_async), to avoid this "magic". If this decorator is necessary for a transition period, please add a comment highlighting the risks and limitations of using inspect.

@wuxibin89 wuxibin89 force-pushed the wuxibin/refactor_config branch from d8b1798 to 66a9c5e Compare February 27, 2026 08:16
mode: async

# Number of nodes for standalone rollout server, must be > 0 in one-step-off/fully async training.
nnodes: 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need to use trainer.nnodes by default here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, rollout.nnodes > 0 means that we need to create separate GPU resource for rollout server.


def generate_sequences(self, prompts: DataProto) -> DataProto:
@auto_await
async def generate_sequences(self, prompts: DataProto) -> DataProto:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the fully async and one step off modes, the validate process still goes through this interface. During the last test, the caller did not await the generate() method, which led to some issues.

Should we not enable auto wait for this interface for now?

Copy link
Collaborator Author

@wuxibin89 wuxibin89 Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the validate process still goes through this interface

The auto_await handles 3 cases:

  1. await generate_sequences()
  2. directly call generate_sequences() in non async context(loop is None)
  3. directly call generate_sequences() in async context(loop is running)

the validate process is case 3, the ci test has cover this case right?

config.actor_rollout_ref.rollout.skip_tokenizer_init = True
config.actor_rollout_ref.rollout.nnodes = 1
config.trainer.n_gpus_per_node = 4
config.trainer.nnodes = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is config.trainer.nnodes = 1 still needed for test_agent_reward_loop_standalone?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, config.trainer.nnodes is not used in standalone mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants