verl-project
diff --git a/‎.github/workflows/e2e_ppo_trainer_veomni_vllm.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/e2e_ppo_trainer_veomni_vllm.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/gpu_unit_tests.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/gpu_unit_tests.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/sgl.yml‎
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/sgl.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.github/workflows/vllm.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/vllm.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/ascend_tutorial/ascend_quick_start.rst‎
Lines changed: 20 additions & 19 deletions b/‎docs/ascend_tutorial/ascend_quick_start.rst‎
Lines changed: 20 additions & 19 deletions
diff --git a/‎docs/ascend_tutorial/dockerfile_build_guidance.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/ascend_tutorial/dockerfile_build_guidance.rst‎
Lines changed: 1 addition & 1 deletion
@@ -134,7 +134,7 @@ jobs:
       - name: Running GEO3K E2E training tests on 8 L20 GPUs with veomni engine (FSDP_SIZE=8, USP=1)
         run: |
           ray stop --force
-          MODEL_ID=Qwen/Qwen3-VL-2B-Instruct TRAIN_FILES=${HOME}/data/geo3k/train.parquet VAL_FILES=${HOME}/data/gsm8k/test.parquet VAL_BEFORE_TRAIN=True NUM_GPUS=8 FSDP_SIZE=8 SP_SIZE=1 EP_SIZE=1 VERL_EXP_NAME="qwen3-2b-vl-function-reward-minimal-fsdp-size8" bash tests/special_e2e/run_ppo_trainer_veomni.sh
+          MODEL_ID=Qwen/Qwen3-VL-2B-Instruct TRAIN_FILES=${HOME}/data/geo3k/train.parquet VAL_FILES=${HOME}/data/gsm8k/test.parquet VAL_BEFORE_TRAIN=True NUM_GPUS=8 FSDP_SIZE=4 SP_SIZE=2 EP_SIZE=1 VERL_EXP_NAME="qwen3-2b-vl-function-reward-minimal-fsdp-size8" bash tests/special_e2e/run_ppo_trainer_veomni.sh
 
   cleanup:
     runs-on: ubuntu-latest
 
@@ -108,7 +108,7 @@ jobs:
           pip3 install hf_transfer
           pip3 install -r requirements-test.txt
           pip3 install --no-deps -e .
-          pip3 install cupy-cuda12x pytest-asyncio
+          pip3 install cupy-cuda12x==13.6.0 pytest-asyncio
           pip3 install --ignore-installed blinker
           pip3 install --ignore-installed mlflow "numpy<2.0"
       - name: Run all GPU unit tests
 
@@ -113,7 +113,7 @@ jobs:
           fetch-depth: 0
       - name: Install the current repository
         run: |
-          pip3 install cupy-cuda12x pytest-asyncio
+          pip3 install cupy-cuda12x==13.6.0 pytest-asyncio
           pip3 install hf_transfer fastmcp pytest-asyncio
           pip3 install -r requirements-test.txt
           pip3 install --no-deps -e .
@@ -144,7 +144,7 @@ jobs:
           fetch-depth: 0
       - name: Install the current repository
         run: |
-          pip3 install cupy-cuda12x pytest-asyncio
+          pip3 install cupy-cuda12x==13.6.0 pytest-asyncio
           pip3 install hf_transfer fastmcp pytest-asyncio
           pip3 install -r requirements-test.txt
           pip3 install --no-deps -e .
 
@@ -144,7 +144,7 @@ jobs:
           fetch-depth: 0
       - name: Install the current repository
         run: |
-          pip3 install cupy-cuda12x pytest-asyncio
+          pip3 install pytest-asyncio
           pip3 install -r requirements-test.txt
           pip3 install --no-deps -e .
           pip3 install --upgrade "transformers<5.0"
 
@@ -283,6 +283,7 @@ Welcome to register your awesome project build with `verl` for other developers'
 - [NoisyRollout](https://github.com/NUS-TRAIL/NoisyRollout): Reinforcing Visual Reasoning with Data Augmentation ![GitHub Repo stars](https://img.shields.io/github/stars/NUS-TRAIL/NoisyRollout)
 - [SPEAR](https://github.com/TencentYoutuResearch/SPEAR): **Self-imitation** with **Progressive Exploration** for Agentic Reinforcement Learning (ICLR 2026) ![GitHub Repo stars](https://img.shields.io/github/stars/TencentYoutuResearch/SPEAR)
 - [RuleReasoner](https://github.com/bigai-nlco/RuleReasoner): **RuleReasoner:** Reinforced Rule-based Reasoning via **Domain-aware Dynamic Sampling** (ICLR 2026) ![GitHub Repo stars](https://img.shields.io/github/stars/bigai-nlco/RuleReasoner)
+- [MetaphorStar](https://metaphorstar.github.io/): **Image Metaphor** Understanding and Reasoning with End-to-End **Visual Reinforcement Learning** ![GitHub Repo stars](https://img.shields.io/github/stars/MING-ZCH/MetaphorStar)
 
 ## Contribution Guide
 
 
@@ -1,7 +1,7 @@
 Ascend Quickstart
 ===================================
 
-Last updated: 12/11/2025.
+Last updated: 2/13/2026.
 
 我们在 verl 上增加对华为昇腾设备的支持。
 
@@ -67,19 +67,20 @@ DockerFile镜像构建 & 使用
     +---------------+----------------------+
     | triton-ascend | == 3.2.0rc4          |
     +---------------+----------------------+
-    | transformers  | latest release       |
+    | transformers  | == 4.57.6            |
     +---------------+----------------------+
-
+    
+    tips: verl is not support transformers 5.0.0 or higher
     安装指令：
-
+    
     .. code-block:: bash
-
+    
         # 安装torchvision，版本需要和torch匹配
         pip install torchvision==0.22.1
-
+    
         # 清理环境上可能存在的历史triton/triton-ascend软件包残留
         pip uninstall -y triton triton-ascend
-
+    
         # 安装triton-ascend，不需要单独安装triton
         pip install triton-ascend==3.2.0rc4
 
@@ -115,30 +116,30 @@ DockerFile镜像构建 & 使用
 MindSpeed 源码安装指令：
 
     .. code-block:: bash
-
+    
         # 下载 MindSpeed，切换到指定commit-id，并下载 Megatron-LM
         git clone https://gitcode.com/Ascend/MindSpeed.git
         cd MindSpeed && git checkout f2b0977e && cd ..
         git clone --depth 1 --branch core_v0.12.1 https://github.com/NVIDIA/Megatron-LM.git
-
+    
         # 安装 MindSpeed & Megatron
         pip install -e MindSpeed
-
+    
         # 将 Megatron-LM 源码路径配置到 PYTHONPATH 环境变量中
         export PYTHONPATH=$PYTHONPATH:"$(pwd)/Megatron-LM"
-
+    
         # （可选）如希望 shell 关闭，或系统重启后，PYTHONPATH 环境变量仍然生效，建议将它添加到 .bashrc 配置文件中
         echo "export PYTHONPATH=$PYTHONPATH:\"$(pwd)/Megatron-LM\"" >> ~/.bashrc
-
+    
         # 安装 mbridge
         pip install mbridge
 
 MindSpeed 对应 Megatron-LM 后端使用场景，使用方式如下：
 
     1. 使能 verl worker 模型 ``strategy`` 配置为 ``megatron`` ，例如 ``actor_rollout_ref.actor.strategy=megatron``。
-
+    
     2. MindSpeed 自定义入参可通过 ``override_transformer_config`` 参数传入，例如对 actor 模型开启 FA 特性可使用 ``+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True``。
-
+    
     3. 更多特性信息可参考 `MindSpeed & verl 文档 <https://gitcode.com/Ascend/MindSpeed/blob/master/docs/user-guide/verl.md>`_ 。
 
 
@@ -163,7 +164,7 @@ verl 中昇腾暂不支持生态库如下：
     +---------------+----------------+
     | liger-kernel  | not supported  |
     +---------------+----------------+
-
+    
     1. 不支持通过 flash_attn 使能 flash attention 加速，支持通过 transformers 使用。
     2. 不支持 liger-kernel 使能。
 
@@ -175,17 +176,17 @@ verl 中昇腾暂不支持生态库如下：
 1.下载数据集并将数据集预处理为parquet格式，以便包含计算RL奖励所需的必要字段
 
     .. code-block:: bash
-
+    
         python3 examples/data_preprocess/gsm8k.py --local_save_dir ~/data/gsm8k
 
 2.执行训练
 
     .. code-block:: bash
-
+    
         set -x
-
+    
         export VLLM_ATTENTION_BACKEND=XFORMERS
-
+    
         python3 -m verl.trainer.main_ppo \
             algorithm.adv_estimator=grpo \
             data.train_files=$HOME/data/gsm8k/train.parquet \
 
@@ -66,7 +66,7 @@ A3              8.3.RC1         SGLang          `Dockerfile.ascend.sglang_8.3.rc
    # vLLM
    docker build -f Dockerfile.ascend_8.3.rc1_a2 -t verl-ascend:8.3.rc1-a2 .
    # SGLang
-   docker build -f Dockerfile.ascend_8.3.rc1_a2 -t verl-ascend-sglang:8.3.rc1-a2 .
+   docker build -f Dockerfile.ascend.sglang_8.3.rc1_a2 -t verl-ascend-sglang:8.3.rc1-a2 .
 
 公开镜像地址
 --------------------