[BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config from rollout#5418
[BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config from rollout#5418wuxibin89 wants to merge 11 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request significantly refactors the system to decouple rollout configurations from the trainer, enhancing modularity. Key improvements include a backward-compatibility layer for configuration, asynchronous refactoring of AgentLoopManager using a create factory pattern, and a robust auto_await decorator. From a security standpoint, no high-severity or critical vulnerabilities were identified, and existing security boundaries are maintained. However, critical issues were found where command-line configurations for rollout resources are incorrectly overwritten by default values, which could break distributed execution. Additionally, a type hint error requires correction for improved code clarity and correctness.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the configuration handling to remove the nested actor_rollout_ref structure, moving towards a flatter configuration. It also introduces significant changes to make the AgentLoopManager and its related components fully asynchronous, using an async create factory pattern and an @auto_await decorator. While the refactoring is a positive step towards cleaner configuration and async-first design, there are a few high-risk areas. The new @auto_await decorator uses stack inspection (inspect.currentframe()), which is fragile and can cause subtle issues. The AgentLoopManager's public __init__ method now leaves the object in a partially initialized state, creating an unsafe API. Finally, there are temporary hacks in the main entry points that manually copy configuration values, introducing technical debt and risk of misconfiguration. These issues should be addressed to ensure the stability and maintainability of the new design.
| # Case 1: No running loop -> run with asyncio.run() | ||
| if loop is None: | ||
| return asyncio.run(coro) | ||
|
|
||
| # Case 2: Running loop -> return coro if caller will await | ||
| caller_frame = inspect.currentframe() | ||
| if caller_frame is not None: | ||
| caller_frame = caller_frame.f_back | ||
| caller_is_async = caller_frame is not None and (caller_frame.f_code.co_flags & inspect.CO_COROUTINE) != 0 | ||
| if caller_is_async: | ||
| return coro | ||
|
|
||
| # Case 3: Running loop -> run coro in thread pool | ||
| # (cannot block the loop thread without deadlock) | ||
| with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool: | ||
| future = pool.submit(asyncio.run, coro) | ||
| return future.result() |
There was a problem hiding this comment.
The use of inspect.currentframe() to determine the calling context is fragile and can lead to subtle bugs. This approach might not work correctly with other Python implementations (e.g., PyPy, Jython) or if the call stack is modified by other decorators or libraries. It also adds a performance overhead. Relying on stack inspection makes the code less predictable and harder to debug. Consider a more explicit API, such as providing separate synchronous and asynchronous methods (e.g., generate_sequences and generate_sequences_async), to avoid this "magic". If this decorator is necessary for a transition period, please add a comment highlighting the risks and limitations of using inspect.
d8b1798 to
66a9c5e
Compare
| mode: async | ||
|
|
||
| # Number of nodes for standalone rollout server, must be > 0 in one-step-off/fully async training. | ||
| nnodes: 0 |
There was a problem hiding this comment.
Do we not need to use trainer.nnodes by default here?
There was a problem hiding this comment.
No, rollout.nnodes > 0 means that we need to create separate GPU resource for rollout server.
|
|
||
| def generate_sequences(self, prompts: DataProto) -> DataProto: | ||
| @auto_await | ||
| async def generate_sequences(self, prompts: DataProto) -> DataProto: |
There was a problem hiding this comment.
In the fully async and one step off modes, the validate process still goes through this interface. During the last test, the caller did not await the generate() method, which led to some issues.
Should we not enable auto wait for this interface for now?
There was a problem hiding this comment.
the validate process still goes through this interface
The auto_await handles 3 cases:
- await generate_sequences()
- directly call generate_sequences() in non async context(loop is None)
- directly call generate_sequences() in async context(loop is running)
the validate process is case 3, the ci test has cover this case right?
| config.actor_rollout_ref.rollout.skip_tokenizer_init = True | ||
| config.actor_rollout_ref.rollout.nnodes = 1 | ||
| config.trainer.n_gpus_per_node = 4 | ||
| config.trainer.nnodes = 1 |
There was a problem hiding this comment.
Is config.trainer.nnodes = 1 still needed for test_agent_reward_loop_standalone?
There was a problem hiding this comment.
No, config.trainer.nnodes is not used in standalone mode.
What does this PR do?
#5400 (comment), get rid of actor_rollout_ref from rollout:
Eventually, we should get rid of actor_rollout_ref in
verl/trainer/config/ppo_trainer.yamlwith flattend actor, ref, rollout fields: