Skip to content

Commit f73636c

Browse files
authored
Merge branch 'verl-project:main' into main
2 parents 179feec + 712de01 commit f73636c

File tree

84 files changed

+5072
-873
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

84 files changed

+5072
-873
lines changed

.github/workflows/e2e_ppo_trainer_veomni_vllm.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ jobs:
134134
- name: Running GEO3K E2E training tests on 8 L20 GPUs with veomni engine (FSDP_SIZE=8, USP=1)
135135
run: |
136136
ray stop --force
137-
MODEL_ID=Qwen/Qwen3-VL-2B-Instruct TRAIN_FILES=${HOME}/data/geo3k/train.parquet VAL_FILES=${HOME}/data/gsm8k/test.parquet VAL_BEFORE_TRAIN=True NUM_GPUS=8 FSDP_SIZE=8 SP_SIZE=1 EP_SIZE=1 VERL_EXP_NAME="qwen3-2b-vl-function-reward-minimal-fsdp-size8" bash tests/special_e2e/run_ppo_trainer_veomni.sh
137+
MODEL_ID=Qwen/Qwen3-VL-2B-Instruct TRAIN_FILES=${HOME}/data/geo3k/train.parquet VAL_FILES=${HOME}/data/gsm8k/test.parquet VAL_BEFORE_TRAIN=True NUM_GPUS=8 FSDP_SIZE=4 SP_SIZE=2 EP_SIZE=1 VERL_EXP_NAME="qwen3-2b-vl-function-reward-minimal-fsdp-size8" bash tests/special_e2e/run_ppo_trainer_veomni.sh
138138
139139
cleanup:
140140
runs-on: ubuntu-latest

.github/workflows/gpu_unit_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ jobs:
108108
pip3 install hf_transfer
109109
pip3 install -r requirements-test.txt
110110
pip3 install --no-deps -e .
111-
pip3 install cupy-cuda12x pytest-asyncio
111+
pip3 install cupy-cuda12x==13.6.0 pytest-asyncio
112112
pip3 install --ignore-installed blinker
113113
pip3 install --ignore-installed mlflow "numpy<2.0"
114114
- name: Run all GPU unit tests

.github/workflows/sgl.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ jobs:
113113
fetch-depth: 0
114114
- name: Install the current repository
115115
run: |
116-
pip3 install cupy-cuda12x pytest-asyncio
116+
pip3 install cupy-cuda12x==13.6.0 pytest-asyncio
117117
pip3 install hf_transfer fastmcp pytest-asyncio
118118
pip3 install -r requirements-test.txt
119119
pip3 install --no-deps -e .
@@ -144,7 +144,7 @@ jobs:
144144
fetch-depth: 0
145145
- name: Install the current repository
146146
run: |
147-
pip3 install cupy-cuda12x pytest-asyncio
147+
pip3 install cupy-cuda12x==13.6.0 pytest-asyncio
148148
pip3 install hf_transfer fastmcp pytest-asyncio
149149
pip3 install -r requirements-test.txt
150150
pip3 install --no-deps -e .

.github/workflows/vllm.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ jobs:
144144
fetch-depth: 0
145145
- name: Install the current repository
146146
run: |
147-
pip3 install cupy-cuda12x pytest-asyncio
147+
pip3 install pytest-asyncio
148148
pip3 install -r requirements-test.txt
149149
pip3 install --no-deps -e .
150150
pip3 install --upgrade "transformers<5.0"

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,7 @@ Welcome to register your awesome project build with `verl` for other developers'
283283
- [NoisyRollout](https://github.com/NUS-TRAIL/NoisyRollout): Reinforcing Visual Reasoning with Data Augmentation ![GitHub Repo stars](https://img.shields.io/github/stars/NUS-TRAIL/NoisyRollout)
284284
- [SPEAR](https://github.com/TencentYoutuResearch/SPEAR): **Self-imitation** with **Progressive Exploration** for Agentic Reinforcement Learning (ICLR 2026) ![GitHub Repo stars](https://img.shields.io/github/stars/TencentYoutuResearch/SPEAR)
285285
- [RuleReasoner](https://github.com/bigai-nlco/RuleReasoner): **RuleReasoner:** Reinforced Rule-based Reasoning via **Domain-aware Dynamic Sampling** (ICLR 2026) ![GitHub Repo stars](https://img.shields.io/github/stars/bigai-nlco/RuleReasoner)
286+
- [MetaphorStar](https://metaphorstar.github.io/): **Image Metaphor** Understanding and Reasoning with End-to-End **Visual Reinforcement Learning** ![GitHub Repo stars](https://img.shields.io/github/stars/MING-ZCH/MetaphorStar)
286287

287288
## Contribution Guide
288289

docs/ascend_tutorial/ascend_quick_start.rst

Lines changed: 20 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Ascend Quickstart
22
===================================
33

4-
Last updated: 12/11/2025.
4+
Last updated: 2/13/2026.
55

66
我们在 verl 上增加对华为昇腾设备的支持。
77

@@ -67,19 +67,20 @@ DockerFile镜像构建 & 使用
6767
+---------------+----------------------+
6868
| triton-ascend | == 3.2.0rc4 |
6969
+---------------+----------------------+
70-
| transformers | latest release |
70+
| transformers | == 4.57.6 |
7171
+---------------+----------------------+
72-
72+
73+
tips: verl is not support transformers 5.0.0 or higher
7374
安装指令:
74-
75+
7576
.. code-block:: bash
76-
77+
7778
# 安装torchvision,版本需要和torch匹配
7879
pip install torchvision==0.22.1
79-
80+
8081
# 清理环境上可能存在的历史triton/triton-ascend软件包残留
8182
pip uninstall -y triton triton-ascend
82-
83+
8384
# 安装triton-ascend,不需要单独安装triton
8485
pip install triton-ascend==3.2.0rc4
8586
@@ -115,30 +116,30 @@ DockerFile镜像构建 & 使用
115116
MindSpeed 源码安装指令:
116117

117118
.. code-block:: bash
118-
119+
119120
# 下载 MindSpeed,切换到指定commit-id,并下载 Megatron-LM
120121
git clone https://gitcode.com/Ascend/MindSpeed.git
121122
cd MindSpeed && git checkout f2b0977e && cd ..
122123
git clone --depth 1 --branch core_v0.12.1 https://github.com/NVIDIA/Megatron-LM.git
123-
124+
124125
# 安装 MindSpeed & Megatron
125126
pip install -e MindSpeed
126-
127+
127128
# 将 Megatron-LM 源码路径配置到 PYTHONPATH 环境变量中
128129
export PYTHONPATH=$PYTHONPATH:"$(pwd)/Megatron-LM"
129-
130+
130131
# (可选)如希望 shell 关闭,或系统重启后,PYTHONPATH 环境变量仍然生效,建议将它添加到 .bashrc 配置文件中
131132
echo "export PYTHONPATH=$PYTHONPATH:\"$(pwd)/Megatron-LM\"" >> ~/.bashrc
132-
133+
133134
# 安装 mbridge
134135
pip install mbridge
135136
136137
MindSpeed 对应 Megatron-LM 后端使用场景,使用方式如下:
137138

138139
1. 使能 verl worker 模型 ``strategy`` 配置为 ``megatron`` ,例如 ``actor_rollout_ref.actor.strategy=megatron``。
139-
140+
140141
2. MindSpeed 自定义入参可通过 ``override_transformer_config`` 参数传入,例如对 actor 模型开启 FA 特性可使用 ``+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True``。
141-
142+
142143
3. 更多特性信息可参考 `MindSpeed & verl 文档 <https://gitcode.com/Ascend/MindSpeed/blob/master/docs/user-guide/verl.md>`_ 。
143144

144145

@@ -163,7 +164,7 @@ verl 中昇腾暂不支持生态库如下:
163164
+---------------+----------------+
164165
| liger-kernel | not supported |
165166
+---------------+----------------+
166-
167+
167168
1. 不支持通过 flash_attn 使能 flash attention 加速,支持通过 transformers 使用。
168169
2. 不支持 liger-kernel 使能。
169170

@@ -175,17 +176,17 @@ verl 中昇腾暂不支持生态库如下:
175176
1.下载数据集并将数据集预处理为parquet格式,以便包含计算RL奖励所需的必要字段
176177

177178
.. code-block:: bash
178-
179+
179180
python3 examples/data_preprocess/gsm8k.py --local_save_dir ~/data/gsm8k
180181
181182
2.执行训练
182183

183184
.. code-block:: bash
184-
185+
185186
set -x
186-
187+
187188
export VLLM_ATTENTION_BACKEND=XFORMERS
188-
189+
189190
python3 -m verl.trainer.main_ppo \
190191
algorithm.adv_estimator=grpo \
191192
data.train_files=$HOME/data/gsm8k/train.parquet \

docs/ascend_tutorial/dockerfile_build_guidance.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ A3 8.3.RC1 SGLang `Dockerfile.ascend.sglang_8.3.rc
6666
# vLLM
6767
docker build -f Dockerfile.ascend_8.3.rc1_a2 -t verl-ascend:8.3.rc1-a2 .
6868
# SGLang
69-
docker build -f Dockerfile.ascend_8.3.rc1_a2 -t verl-ascend-sglang:8.3.rc1-a2 .
69+
docker build -f Dockerfile.ascend.sglang_8.3.rc1_a2 -t verl-ascend-sglang:8.3.rc1-a2 .
7070
7171
公开镜像地址
7272
--------------------

0 commit comments

Comments
 (0)