Skip to content

Commit 6f6aeb3

Browse files
authored
Merge branch 'verl-project:main' into main
2 parents c867643 + 7f4b76a commit 6f6aeb3

File tree

117 files changed

+5929
-694
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

117 files changed

+5929
-694
lines changed

.github/workflows/cpu_unit_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ jobs:
9595
run: |
9696
pip3 install -r requirements-test.txt
9797
pip3 install --no-deps -e .
98-
pip3 install --upgrade transformers
98+
pip3 install --upgrade "transformers<5.0.0"
9999
- name: Download datasets
100100
run: |
101101
python3 examples/data_preprocess/gsm8k.py --local_dataset_path ${HOME}/models/hf_data/gsm8k

.github/workflows/gpu_unit_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ jobs:
113113
pip3 install --ignore-installed mlflow "numpy<2.0"
114114
- name: Run all GPU unit tests
115115
run: |
116-
pytest -s -x --ignore-glob="*on_npu.py" --ignore-glob="*test_special_*.py" --ignore-glob='*on_cpu.py' --ignore-glob="*test_vllm*" --ignore-glob="*_sglang*" --ignore-glob="*_hf_rollout*" --ignore-glob="tests/models/" --ignore-glob='tests/special*' --ignore-glob="tests/experimental" --ignore-glob="tests/workers/reward_model" tests/
116+
pytest -s -x --ignore-glob="*on_npu.py" --ignore-glob="*test_special_*.py" --ignore-glob='*on_cpu.py' --ignore-glob="*test_vllm*" --ignore-glob="*_sglang*" --ignore-glob="*_hf_rollout*" --ignore-glob="tests/models/" --ignore-glob='tests/special*' --ignore-glob="tests/experimental" --ignore-glob="tests/workers/reward_model" --ignore-glob="*test_shared_memory*" tests/
117117
- name: Testing LinearCrossEntropyTP Correctness, Computation Time and Memory Consumption
118118
run: |
119119
LOW_MEMORY=True torchrun --standalone --nnodes=1 --nproc-per-node=8 tests/utils/test_special_linear_cross_entropy_tp.py

.github/workflows/model.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ jobs:
9999
run: |
100100
pip3 install -r requirements-test.txt
101101
pip3 install --no-deps -e .
102-
pip3 install --upgrade transformers
102+
pip3 install --upgrade "transformers<5.0.0"
103103
- name: Running rmpad model tests on 8 L20 GPUs + flash_attn 2.5.8
104104
run: |
105105
pytest -s tests/models/test_transformer.py

.github/workflows/npu_unit_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ jobs:
109109
- name: Run all NPU unit tests
110110
run: |
111111
export PYTHONPATH=$PYTHONPATH:/Megatron-LM
112-
pytest -s -x --ignore-glob="*test_special_*.py" --ignore-glob="*on_cpu.py" --ignore-glob="*test_vllm*" --ignore-glob="*_sglang*" --ignore-glob="*_hf_rollout*" --ignore-glob="tests/models/" --ignore-glob="tests/special*" --ignore-glob="tests/experimental" --ignore-glob="tests/workers/reward_model" --ignore-glob="*test_rvdz*" --ignore-glob="*test_ray_collectives*" --ignore-glob="*test_nvtx_profile*" --ignore-glob="tests/checkpoint_engine" tests/
112+
pytest -s -x --ignore-glob="*test_special_*.py" --ignore-glob="*on_cpu.py" --ignore-glob="*test_vllm*" --ignore-glob="*_sglang*" --ignore-glob="*_hf_rollout*" --ignore-glob="tests/models/" --ignore-glob="tests/special*" --ignore-glob="tests/experimental" --ignore-glob="tests/workers/reward_model" --ignore-glob="*test_rvdz*" --ignore-glob="*test_ray_collectives*" --ignore-glob="*test_nvtx_profile*" --ignore-glob="tests/checkpoint_engine" --ignore-glob="*test_shared_memory*" tests/
113113
- name: Testing FSDP2 actor functionality
114114
run: |
115115
torchrun --standalone --nnodes=1 --nproc-per-node=2 tests/workers/actor/test_special_dp_actor.py

.github/workflows/reward_model_sglang.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ jobs:
115115
- name: Running sglang agent loop with reward manager tests on 8 L20 GPUs
116116
run: |
117117
unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY
118-
ROLLOUT_NAME=sglang pytest -s -x tests/experimental/reward_loop/test_agent_loop_reward_manager.py
118+
ROLLOUT_NAME=sglang pytest -s -x tests/experimental/reward_loop/test_agent_reward_loop_standalone.py
119119
- name: Running sglang agent loop with reward model colocate tests on 8 L20 GPUs
120120
run: |
121121
unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY

.github/workflows/reward_model_vllm.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ jobs:
115115
- name: Running vllm agent loop with reward manager tests on 8 L20 GPUs
116116
run: |
117117
unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY
118-
ROLLOUT_NAME=vllm pytest -s -x tests/experimental/reward_loop/test_agent_loop_reward_manager.py
118+
ROLLOUT_NAME=vllm pytest -s -x tests/experimental/reward_loop/test_agent_reward_loop_standalone.py
119119
- name: Running vllm agent loop with reward model colocate tests on 8 L20 GPUs
120120
run: |
121121
unset http_proxy https_proxy HTTP_PROXY HTTPS_PROXY

.github/workflows/reward_model_vllm_ascend.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ jobs:
105105
ROLLOUT_NAME=vllm pytest -s -x tests/experimental/reward_loop/test_reward_model_disrm.py
106106
- name: Running vllm agent loop with reward manager tests on 8 NPUs
107107
run: |
108-
ROLLOUT_NAME=vllm pytest -s -x tests/experimental/reward_loop/test_agent_loop_reward_manager.py
108+
ROLLOUT_NAME=vllm pytest -s -x tests/experimental/reward_loop/test_agent_reward_loop_standalone.py
109109
- name: Running vllm agent loop with reward model colocate tests on 8 NPUs
110110
run: |
111111
export HCCL_HOST_SOCKET_PORT_RANGE=auto

.github/workflows/sanity.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ jobs:
9090
fi
9191
- name: Assert SGLang naming convention
9292
run: |
93-
if grep -rIn --exclude-dir=.git --exclude-dir=.github --exclude-dir=venv --exclude-dir=__pycache__ -E 'Sglang|sgLang|sglAng|sglaNg|sglanG' .; then
93+
if grep -rIn --exclude-dir=.git --exclude-dir=.github --exclude-dir=venv --exclude-dir=__pycache__ --exclude=ascend_sglang_best_practices.rst -E 'Sglang|sgLang|sglAng|sglaNg|sglanG' .; then
9494
echo "Please use SGLang or sglang as the formal name of SGLang rollout engine"
9595
exit 1
9696
fi

.github/workflows/vllm.yml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -109,12 +109,13 @@ jobs:
109109
run: |
110110
pip3 install -r requirements-test.txt
111111
pip3 install --no-deps -e .
112+
pip3 install --upgrade "transformers<5.0"
112113
# - name: Download Model to Use
113114
# run: |
114-
# huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B-Instruct
115-
# huggingface-cli download Qwen/Qwen2.5-1.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-1.5B-Instruct
116-
# huggingface-cli download Qwen/Qwen2.5-VL-3B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct
117-
# huggingface-cli download OldKingMeister/Qwen2.5-1.5B-Instruct-YaRN --local-dir ${HOME}/models/OldKingMeister/Qwen2.5-1.5B-Instruct-YaRN
115+
# hf download Qwen/Qwen2.5-0.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-0.5B-Instruct
116+
# hf download Qwen/Qwen2.5-1.5B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-1.5B-Instruct
117+
# hf download Qwen/Qwen2.5-VL-3B-Instruct --local-dir ${HOME}/models/Qwen/Qwen2.5-VL-3B-Instruct
118+
# hf download OldKingMeister/Qwen2.5-1.5B-Instruct-YaRN --local-dir ${HOME}/models/OldKingMeister/Qwen2.5-1.5B-Instruct-YaRN
118119
# export HF_HUB_OFFLINE=1
119120
- name: Prepare gsm8k dataset
120121
run: |
@@ -146,6 +147,7 @@ jobs:
146147
pip3 install cupy-cuda12x pytest-asyncio
147148
pip3 install -r requirements-test.txt
148149
pip3 install --no-deps -e .
150+
pip3 install --upgrade "transformers<5.0"
149151
- name: Test vLLM ServerAdapter with Checkpoint Engine (NCCL)
150152
run: |
151153
ROLLOUT_NAME=vllm pytest -svvv tests/checkpoint_engine/test_special_server_adapter.py

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,8 @@ Welcome to register your awesome project build with `verl` for other developers'
282282
- [deepscaler](https://github.com/agentica-project/rllm/tree/deepscaler): iterative context scaling with GRPO ![GitHub Repo stars](https://img.shields.io/github/stars/agentica-project/deepscaler)
283283
- [DAPO](https://dapo-sia.github.io/): the fully open source SOTA RL algorithm that beats DeepSeek-R1-zero-32B ![GitHub Repo stars](https://img.shields.io/github/stars/volcengine/verl)
284284
- [NoisyRollout](https://github.com/NUS-TRAIL/NoisyRollout): Reinforcing Visual Reasoning with Data Augmentation ![GitHub Repo stars](https://img.shields.io/github/stars/NUS-TRAIL/NoisyRollout)
285+
- [SPEAR](https://github.com/TencentYoutuResearch/SPEAR): **Self-imitation** with **Progressive Exploration** for Agentic Reinforcement Learning (ICLR 2026) ![GitHub Repo stars](https://img.shields.io/github/stars/TencentYoutuResearch/SPEAR)
286+
- [RuleReasoner](https://github.com/bigai-nlco/RuleReasoner): **RuleReasoner:** Reinforced Rule-based Reasoning via **Domain-aware Dynamic Sampling** (ICLR 2026) ![GitHub Repo stars](https://img.shields.io/github/stars/bigai-nlco/RuleReasoner)
285287

286288
## Contribution Guide
287289

0 commit comments

Comments
 (0)