[BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config from rollout by wuxibin89 · Pull Request #5418 · verl-project/verl

wuxibin89 · 2026-02-26T17:39:55Z

What does this PR do?

#5400 (comment), get rid of actor_rollout_ref from rollout:

AgentLoopManager
AgentLoopWorker
AgentLoopBase: SingleTurnAgentLoop, ToolAgentLoop

Eventually, we should get rid of actor_rollout_ref in verl/trainer/config/ppo_trainer.yaml with flattend actor, ref, rollout fields:

actor:
  _target_: verl.workers.config.FSDPActorConfig
  ...

ref:
  _target_: verl.workers.config.FSDPActorConfig
 ...

rollout:
  _target_: verl.workers.config.RolloutConfig
  ...

… from agent loop

gemini-code-assist

Code Review

This pull request significantly refactors the system to decouple rollout configurations from the trainer, enhancing modularity. Key improvements include a backward-compatibility layer for configuration, asynchronous refactoring of AgentLoopManager using a create factory pattern, and a robust auto_await decorator. From a security standpoint, no high-severity or critical vulnerabilities were identified, and existing security boundaries are maintained. However, critical issues were found where command-line configurations for rollout resources are incorrectly overwritten by default values, which could break distributed execution. Additionally, a type hint error requires correction for improved code clarity and correctness.

verl/experimental/fully_async_policy/fully_async_main.py

verl/experimental/one_step_off_policy/main_ppo.py

verl/experimental/agent_loop/agent_loop.py

wuxibin89 · 2026-02-26T17:49:05Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the configuration handling to remove the nested actor_rollout_ref structure, moving towards a flatter configuration. It also introduces significant changes to make the AgentLoopManager and its related components fully asynchronous, using an async create factory pattern and an @auto_await decorator. While the refactoring is a positive step towards cleaner configuration and async-first design, there are a few high-risk areas. The new @auto_await decorator uses stack inspection (inspect.currentframe()), which is fragile and can cause subtle issues. The AgentLoopManager's public __init__ method now leaves the object in a partially initialized state, creating an unsafe API. Finally, there are temporary hacks in the main entry points that manually copy configuration values, introducing technical debt and risk of misconfiguration. These issues should be addressed to ensure the stability and maintainability of the new design.

verl/experimental/fully_async_policy/fully_async_main.py

verl/experimental/one_step_off_policy/main_ppo.py

gemini-code-assist · 2026-02-26T17:52:19Z

verl/utils/ray_utils.py

+        # Case 1: No running loop -> run with asyncio.run()
+        if loop is None:
            return asyncio.run(coro)

+        # Case 2: Running loop -> return coro if caller will await
+        caller_frame = inspect.currentframe()
+        if caller_frame is not None:
+            caller_frame = caller_frame.f_back
+        caller_is_async = caller_frame is not None and (caller_frame.f_code.co_flags & inspect.CO_COROUTINE) != 0
+        if caller_is_async:
+            return coro
+
+        # Case 3: Running loop -> run coro in thread pool
+        # (cannot block the loop thread without deadlock)
+        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
+            future = pool.submit(asyncio.run, coro)
+            return future.result()


The use of inspect.currentframe() to determine the calling context is fragile and can lead to subtle bugs. This approach might not work correctly with other Python implementations (e.g., PyPy, Jython) or if the call stack is modified by other decorators or libraries. It also adds a performance overhead. Relying on stack inspection makes the code less predictable and harder to debug. Consider a more explicit API, such as providing separate synchronous and asynchronous methods (e.g., generate_sequences and generate_sequences_async), to avoid this "magic". If this decorator is necessary for a transition period, please add a comment highlighting the risks and limitations of using inspect.

ArronHZG · 2026-02-27T08:53:03Z

verl/trainer/config/rollout/rollout.yaml

 mode: async

+# Number of nodes for standalone rollout server, must be > 0 in one-step-off/fully async training.
+nnodes: 0


Do we not need to use trainer.nnodes by default here?

No, rollout.nnodes > 0 means that we need to create separate GPU resource for rollout server.

ArronHZG · 2026-02-27T08:56:42Z

verl/experimental/agent_loop/agent_loop.py


-    def generate_sequences(self, prompts: DataProto) -> DataProto:
+    @auto_await
+    async def generate_sequences(self, prompts: DataProto) -> DataProto:


In the fully async and one step off modes, the validate process still goes through this interface. During the last test, the caller did not await the generate() method, which led to some issues.

Should we not enable auto wait for this interface for now?

the validate process still goes through this interface

The auto_await handles 3 cases:

await generate_sequences()

directly call generate_sequences() in non async context(loop is None)

directly call generate_sequences() in async context(loop is running)

the validate process is case 3, the ci test has cover this case right?

Shangwei-Li · 2026-02-27T12:13:50Z

tests/experimental/reward_loop/test_agent_reward_loop_standalone.py

    config.actor_rollout_ref.rollout.skip_tokenizer_init = True
+    config.actor_rollout_ref.rollout.nnodes = 1
    config.trainer.n_gpus_per_node = 4
    config.trainer.nnodes = 1


Is config.trainer.nnodes = 1 still needed for test_agent_reward_loop_standalone?

No, config.trainer.nnodes is not used in standalone mode.

wuxibin89 added 6 commits February 26, 2026 17:03

[BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config…

9bf16fa

… from agent loop

one_step_off_policy

05f879c

fully_async

d23b282

fix auto_await

80c8f3d

revert

231ec7c

fix

273f8e6

wuxibin89 requested review from ArronHZG, PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners February 26, 2026 17:39

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

verl/experimental/fully_async_policy/fully_async_main.py Show resolved Hide resolved

verl/experimental/one_step_off_policy/main_ppo.py Show resolved Hide resolved

verl/experimental/agent_loop/agent_loop.py Outdated Show resolved Hide resolved

wuxibin89 added 2 commits February 27, 2026 01:44

fix

7a6417e

revert

380467c

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

wuxibin89 added 2 commits February 27, 2026 16:14

fix ci

16e1f87

fix

66a9c5e

wuxibin89 force-pushed the wuxibin/refactor_config branch from d8b1798 to 66a9c5e Compare February 27, 2026 08:16

ArronHZG reviewed Feb 27, 2026

View reviewed changes

Merge branch 'main' into wuxibin/refactor_config

6681f6d

Shangwei-Li reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config from rollout#5418

[BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config from rollout#5418
wuxibin89 wants to merge 11 commits intoverl-project:mainfrom
wuxibin89:wuxibin/refactor_config

wuxibin89 commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wuxibin89 commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Feb 26, 2026

Uh oh!

ArronHZG Feb 27, 2026

Uh oh!

wuxibin89 Feb 27, 2026

Uh oh!

ArronHZG Feb 27, 2026

Uh oh!

wuxibin89 Feb 27, 2026 •

edited

Loading

Uh oh!

Shangwei-Li Feb 27, 2026

Uh oh!

wuxibin89 Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wuxibin89 commented Feb 26, 2026

What does this PR do?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wuxibin89 commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

ArronHZG Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

wuxibin89 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

ArronHZG Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

wuxibin89 Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shangwei-Li Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

wuxibin89 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wuxibin89 Feb 27, 2026 •

edited

Loading