Skip to content

Commit b325cc9

Browse files
authored
Merge branch 'verl-project:main' into main
2 parents f73636c + e3b187a commit b325cc9

File tree

132 files changed

+2403
-4404
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

132 files changed

+2403
-4404
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
/verl/workers/actor/megatron_actor.py @ISEEKYAN @vermouth1992
2121
/verl/workers/critic/megatron_critic.py @ISEEKYAN @vermouth1992
2222
/verl/workers/megatron_workers.py @ISEEKYAN @vermouth1992
23+
/verl/experimental @wuxibin89 @ArronHZG
2324

2425
/tests/single_controller @zw0610 @wuxibin89
2526
/tests/trainer @eric-haibin-lin @vermouth1992 @tongyx361 @PeterSH6

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
- [ ] Search for similar PRs. Paste at least one query link here: ...
88
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI)
9-
- `{modules}` include `fsdp`, `megatron`, `veomni`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
9+
- `{modules}` include `fsdp`, `megatron`, `veomni`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`, `fully_async`, `one_step_off`
1010
- If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]`
1111
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
1212
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title.

.github/workflows/e2e_ascend.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,10 @@ jobs:
126126
ray stop --force
127127
export PYTHONPATH=$PYTHONPATH:/Megatron-LM
128128
USE_DIST_CKPT=True USE_DUMMY_MODEL=True DUMMY_MODEL_CONFIG_PATH=tests/special_e2e/ppo_trainer/expert_parallel/qwen3moe_minimal.json DUMMY_MODEL_PATH=$HOME/dist_ckpt/qwen3_30b_grpo_mindspeed bash tests/special_npu/run_qwen3_30b_grpo_mindspeed.sh
129+
- name: Running the E2E test with fully_async_policy algorithm (FSDP2)
130+
run: |
131+
ray stop --force
132+
bash tests/special_npu/run_fully_async_policy.sh
129133
130134
vlm_rl_job:
131135
if: github.repository_owner == 'verl-project'

.github/workflows/e2e_one_step_off_policy_ascend.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ on:
6868
# Entrypoints
6969
- ".github/workflows/e2e_one_step_off_policy_ascend.yml"
7070
- "examples/data_preprocess/gsm8k.py"
71-
- "tests/special_e2e/run_one_step_off_policy.sh"
71+
- "tests/special_npu/run_one_step_off_policy.sh"
7272

7373
# Cancel jobs on the same ref if a new one is triggered
7474
concurrency:
@@ -122,7 +122,7 @@ jobs:
122122
- name: Running the E2E test with one_step_off_policy algorithm (FSDP2)
123123
run: |
124124
ray stop --force
125-
bash tests/special_e2e/run_one_step_off_policy.sh
125+
bash tests/special_npu/run_one_step_off_policy.sh
126126
127127
# Test Megatron strategy
128128
e2e_one_step_off_policy_megatron_ascend:
@@ -167,4 +167,4 @@ jobs:
167167
run: |
168168
ray stop --force
169169
export PYTHONPATH=$PYTHONPATH:/Megatron-LM
170-
bash tests/special_e2e/run_one_step_off_policy.sh
170+
bash tests/special_npu/run_one_step_off_policy.sh

.github/workflows/e2e_sft_llm.yml

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ jobs:
110110
- name: Prepare gsm8k dataset
111111
run: |
112112
ray stop --force
113-
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
113+
python3 examples/data_preprocess/gsm8k_multiturn_sft.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
114114
- name: Running GSM8K E2E training tests on 8 L20 GPUs with rmpad using function rm
115115
run: |
116116
ray stop --force
@@ -123,10 +123,6 @@ jobs:
123123
run: |
124124
ray stop --force
125125
SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh
126-
- name: Check loss difference between sequence parallel vs. default implementation
127-
run: |
128-
ray stop --force
129-
ENTRYPOINT="tests/special_e2e/sft/test_sp_loss_match.py" SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh
130126
- name: Running GSM8K E2E training tests on 8 L20 GPUs with sequence parallism and liger
131127
run: |
132128
ray stop --force
@@ -140,10 +136,6 @@ jobs:
140136
ray stop --force
141137
LORA_RANK=32 RESUME_MODE=auto TOTAL_TRAIN_STEP=2 bash tests/special_e2e/sft/run_sft.sh
142138
# TODO: multiturn
143-
- name: Prepare gsm8k dataset
144-
run: |
145-
ray stop --force
146-
python3 examples/data_preprocess/gsm8k_multiturn_sft.py --local_dataset_path ${HOME}/models/hf_data/gsm8k
147139
- name: Running GSM8K E2E training tests with multiturn and various configs and compare results
148140
run: |
149141
bash tests/special_e2e/sft/test_sft_engine_all.sh

.github/workflows/e2e_sft_llm_ascend.yml

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ jobs:
109109
ln -s /root/.cache/models ~/models
110110
- name: Prepare gsm8k dataset
111111
run: |
112-
python examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
112+
python3 examples/data_preprocess/gsm8k_multiturn_sft.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
113113
- name: Running GSM8K E2E training tests on 8 NPUs with rmpad using function rm
114114
run: |
115115
ray stop --force
@@ -122,10 +122,6 @@ jobs:
122122
run: |
123123
ray stop --force
124124
SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh
125-
- name: Check loss difference between sequence parallel vs. default implementation
126-
run: |
127-
ray stop --force
128-
ENTRYPOINT="tests/special_e2e/sft/test_sp_loss_match.py" SP_SIZE=2 bash tests/special_e2e/sft/run_sft.sh
129125
- name: Running GSM8K E2E training tests with LoRA
130126
run: |
131127
ray stop --force
@@ -134,11 +130,6 @@ jobs:
134130
run: |
135131
ray stop --force
136132
LORA_RANK=32 RESUME_MODE=auto TOTAL_TRAIN_STEP=2 bash tests/special_e2e/sft/run_sft.sh
137-
# TODO: multiturn
138-
- name: Prepare gsm8k dataset
139-
run: |
140-
ray stop --force
141-
python3 examples/data_preprocess/gsm8k_multiturn_sft.py --local_dataset_path ${HOME}/.cache/datasets/openai/gsm8k
142133
- name: Running GSM8K E2E training tests with multiturn and various configs and compare results
143134
run: |
144135
export PYTHONPATH=$PYTHONPATH:/Megatron-LM

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
**/playground
99
**/wandb
1010

11+
/pyrightconfig.json
12+
1113
# Byte-compiled / optimized / DLL files
1214
__pycache__/
1315
*.py[cod]

docker/Dockerfile.stable.vllm

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,9 @@ RUN pip install torch==2.9.1 torchvision torchaudio --index-url https://download
3232
RUN sed -i '/nvidia-cudnn-cu12/d' /usr/local/lib/python3.12/dist-packages/torch-2.9.1+cu129.dist-info/METADATA
3333
RUN pip install --no-deps --force-reinstall nvidia-cudnn-cu12==9.16.0.29
3434

35+
# NOTE: This installs the `vllm` source code in `/vllm`.
36+
# This might break the (based)pyright type checking. To fix it, add `/vllm` to `extraPaths` in `pyrightconfig.json`.
37+
# c.f. https://docs.basedpyright.com/latest/configuration/config-files/
3538
RUN git clone --depth 1 -b v0.12.0 https://github.com/vllm-project/vllm.git && \
3639
cd vllm && \
3740
find requirements -name "*.txt" -print0 | xargs -0 sed -i '/torch/d' && \

docs/advance/fully_async.md

Lines changed: 0 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -106,9 +106,6 @@ https://github.com/ArronHZG/verl-community/blob/main/docs/fully_async_policy_rev
106106
| `async_training.trigger_parameter_sync_step` | Indicates how many local updates FullyAsyncTrainer performs before a parameter synchronization |
107107
| `async_training.staleness_threshold` | Freshness control |
108108
| `async_training.partial_rollout` | Whether to perform partial_rollout |
109-
| `async_training.checkpoint_engine.enable` | Whether to use checkpoint_engine for accelerating, default `True` |
110-
| `async_training.checkpoint_engine.overlap_broadcast_and_consume` | When use checkpoint_engine, whether to overlap broadcast and load_weights, default `False` |
111-
| `async_training.checkpoint_engine.device_buffer_size_M` | When use checkpoint_engine, the user-specific bucket size (MB), default `4096` |
112109
| `async_training.use_trainer_do_validate` | Whether use trainer node to do validate process, default `False` |
113110

114111
**Further Explanation:**
@@ -182,27 +179,6 @@ https://github.com/ArronHZG/verl-community/blob/main/docs/fully_async_policy_rev
182179
mode d
183180
(async stream pipeline with partial rollout), our implementation approximates `Areal's Decoupled PPO`.
184181

185-
* `async_training.checkpoint_engine.enable`
186-
187-
Enabling the checkpoint engine generally reduces synchronization time overhead by more than 60% compared to
188-
the original per-tensor parameter synchronization method. However, assembling buckets incurs additional
189-
temporary GPU memory overhead.
190-
191-
* `async_training.checkpoint_engine.overlap_broadcast_and_consume`
192-
193-
Enabling pipeline between the broadcast and load_weights parameters will allocate additional GPU memory.
194-
Since the main time consumption for parameter synchronization is not in the broadcast and load_weights phases,
195-
but in the parameter generation phase (by megatron or FSDP), this option is off by default.
196-
197-
* `async_training.checkpoint_engine.device_buffer_size_M`
198-
199-
It controls the size of the memory buffer used for synchronization when the checkpoint-engine is enabled.
200-
The actual `bucket_size` = `max(device_buffer_size_M, maximum parameter tensor size)`.
201-
* When enable `overlap_broadcast_and_consume`, the additional device memory overhead of
202-
trainer rank is `3 * bucket_size`and rollout rank is `2 * bucket_size`
203-
* When disable `overlap_broadcast_and_consume`, the additional device memory overhead of
204-
trainer rank is `2 * bucket_size`and rollout rank is `1 * bucket_size`
205-
206182
* `async_training.use_trainer_do_validate`
207183

208184
It controls whether to use the trainer's `do_validate` method for validation.

docs/advance/mtp.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,21 @@
22

33
**Author**: `https://github.com/meituan-search`
44

5-
Last updated: 01/30/2026
5+
Last updated: 02/15/2026
66

77
# 1. Scope of Support
88

99
Currently, RL training can be performed on mimo-7B-RL, Qwen-next, and Deepseek series models based on the MTP architecture. The support rules for training and inference engines are as follows:
1010

11-
- **Training Engine**: Only supports the `mbridge + megatron` combination; other training engines are not compatible at this time;
11+
- **Training Engine**: Only supports the `mbridge/Megatron-Bridge + megatron` combination; other training engines are not compatible at this time;
1212

1313
- **Inference Engine**: Compatible with all engines, but the model must be in the corresponding engine's compatibility list;
1414

1515
- **Dependency Versions**:
1616

17-
- mbridge: Use the specified branch: [https://github.com/ArronHZG/mbridge/tree/feature/verl_mtp](https://github.com/ArronHZG/mbridge/tree/feature/verl_mtp) (will be merged into the main branch in the future);
17+
- mbridge: Apply the patches and review suggestions from PR: [#62](https://github.com/ISEEKYAN/mbridge/pull/62) (will be merged into the main branch in the future);
18+
19+
- Megatron-Bridge: Apply the patches and review suggestions from PR if you want to try out mimo-7B-RL: [#2387](https://github.com/NVIDIA-NeMo/Megatron-Bridge/pull/2387) (will be merged into the main branch in the future);
1820

1921
- megatron: Use the latest dev version (commit: [23e092f41ec8bc659020e401ddac9576c1cfed7e](https://github.com/NVIDIA/Megatron-LM/tree/23e092f41ec8bc659020e401ddac9576c1cfed7e)), which supports MTP + CP training methods.
2022

0 commit comments

Comments
 (0)