v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations #2225

eric-haibin-lin · 2025-06-27T00:13:20Z

eric-haibin-lin
Jun 27, 2025
Maintainer

v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations

Key changes

PPO fixes and enhancements

Fixed a bug related to vf_loss coefficient for PPO, which was introduced in v0.4 [algo] fix: vf_loss factor #2016
Improved numerical stability when clamping KL divergence-related values Stabilize loss calculations by clamping KL divergence values #1779

Checkpoints related

Switched Megatron checkpointer to mcore's dist_checkpoint, which reduces peak memory usage and improves distributed model saving performance via *.checkpoint.async_save=True.
[BREAKING] Megatron's checkpoint directory layout is updated accordingly. Documentation
[BREAKING] Checkpoint manager constructor now takes checkpoint_config as the keyword to replace checkpoint_contents [megatron] feat: Support of dist checkpoint #2125
Checkpoint merger for LoRA is fixed [ckpt] feat: model_merger.py support processing checkpoints with LoRA adapters #1821 via python -m verl.model_merger merge .... Documentation

Experimental function calling & MCP interfaces

These features are experimental and subject to changes in the future

Chat completion scheduler now speaks the OpenAI function-calling schema with an OpenAI server [rollout] feat: follow OpenAI tool calling schema in chat scheduler #1831
SGLang rollout with MCP client [tool] feat: Add Search Tool implemented with MCP #1948 Documentation
SGLang multi-turn rollout code walk-through documentation
Multi-turn interaction system with SGLang, enabling dynamic conversational feedback and iterative problem-solving scenarios [sglang] feat: Support async multi-turn rollout with simulation feedback in sglang #1630, the building block for SCoRe

New models and recipes

New recipe/entropy to reproduce the paper The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning with Clip-Cov and KL-Cov methods
Megatron support for Qwen-2.5-VL [megatron] feat: qwen2.5vl #1286
Multi-turn SFT support for Qwen-3 [training_utils] Add qwen3 multi-turn sft support #1889
Enhanced kimi-vl with sequence parallelism fix sequence parallelism conflict in kimiVL #1899

SGLang optimizations

rollout with SGLang memory usage is further optimized. Blog
async multi-turn rollout with multi-modal support now available in SGLang [sglang] feat: add multimodal input to multiturn async rollout #2014

Other performance profiling & optimizations

Nsight system profiling is available. Documentation
FSDP prefetch can be enabled via [actor|ref].fsdp_config.forward_prefetch=True [FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy #1927
The memory usage for entropy computation can be drastically reduced with fused kernels using [actor|ref].entropy_checkpointing=True and [actor|ref].entropy_from_logits_with_chunking=True [FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy #1927

Other breaking changes and deprecations
See #1902

What's Changed

[feat] Wandb Timing: Add more detailed timing of gen_sequence and weights resharding by @ETOgaosion in [feat] Wandb Timing: Add more detailed timing of gen_sequence and weights resharding #1834
[rollout] feat: follow OpenAI tool calling schema in chat scheduler by @wuxibin89 in [rollout] feat: follow OpenAI tool calling schema in chat scheduler #1831
[release] chore: bump version to v0.4 by @eric-haibin-lin in [release] chore: bump version to v0.4 #1897
Dockerfile.rocm update tensordict==0.6.2 by @vickytsang in Dockerfile.rocm update tensordict==0.6.2 #1898
[feat] add validation shuffle by @mlpod in [feat] add validation shuffle #1886
[feat][BREAKING] Megatron: Support learning rate scheduler by @ETOgaosion in [feat][BREAKING] Megatron: Support learning rate scheduler #1701
fix errors in megatron_workers.py by @davidjsonn in fix errors in megatron_workers.py #1906
[tests] chore: add PR title check by @eric-haibin-lin in [tests] chore: add PR title check #1901
fix qwen2vl grpo for vllm 0.9 and transformers 4.52 by @hiyouga in fix qwen2vl grpo for vllm 0.9 and transformers 4.52 #1880
[rollout] fix: error in __collect_lora_params() in FSDPVLLMShardingManager by @rocke2020 in [rollout] fix: error in __collect_lora_params() in FSDPVLLMShardingManager #1909
[recipe] feat: char count by @vermouth1992 in [recipe] feat: char count #1908
fix typos by @davidjsonn in fix typos #1912
[trainer] refactor: refactor reward manager, advantage estimator by @eric-haibin-lin in [trainer] refactor: refactor reward manager, advantage estimator #1916
set CUDA and HIP VISIBLE DEVICES by @YangWang92 in set CUDA and HIP VISIBLE DEVICES #1914
[ppo] feat: add critic valuehead model support for multi-modal PPO by @Yangruipis in [ppo] feat: add critic valuehead model support for multi-modal PPO #1839
[bugfix] fix megatron model merger by @ShareLer in [bugfix] fix megatron model merger #1774
revert HIP_VISIBLE_DEVICES in worker.py by @YangWang92 in revert HIP_VISIBLE_DEVICES in worker.py #1920
[worker] fix: do not break dynamic bsz in dp critic by @hiyouga in [worker] fix: do not break dynamic bsz in dp critic #1922
[sglang] feat: Efficient and model-agnostic multi-turn messages tokenization and masking by @jybsuper in [sglang] feat: Efficient and model-agnostic multi-turn messages tokenization and masking #1668
[rollout] fix: fix async llm config passing by @eric-haibin-lin in [rollout] fix: fix async llm config passing #1933
[sglang] fix: Fix tool call parser not found error for SGLang==0.4.6.post5 by @jybsuper in [sglang] fix: Fix tool call parser not found error for SGLang==0.4.6.post5 #1852
fix sequence parallelism conflict in kimiVL by @ShareLer in fix sequence parallelism conflict in kimiVL #1899
[megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 by @jinqinn in [megatron] refactor: support MLATransformerConfig abstraction for DeepSeek V3 #1836
[rollout] feat: add async llm perf script by @wuxibin89 in [rollout] feat: add async llm perf script #1930
[megatron] feat: qwen2.5vl by @ISEEKYAN in [megatron] feat: qwen2.5vl #1286
[ckpt] feat: model_merger.py support processing checkpoints with LoRA adapters by @thelongestusernameofall in [ckpt] feat: model_merger.py support processing checkpoints with LoRA adapters #1821
[hardware] fix: fix issue when sp>1 on ASCEND NPU by @as12138 in [hardware] fix: fix issue when sp>1 on ASCEND NPU #1942
[megatron] fix: rope_type typo in config_converter.py by @donpromax in [megatron] fix: rope_type typo in config_converter.py #1944
[training_utils] Add qwen3 multi-turn sft support by @SwordFaith in [training_utils] Add qwen3 multi-turn sft support #1889
[fsdp] fix: fsdp entropy metrics by @ETOgaosion in [fsdp] fix: fsdp entropy metrics #1943
[FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy by @CurryRice233 in [FSDP] feat: Add FSDP forward pefetch and recompute chunking entropy #1927
[rollout] fix: set repetition_penalty=1.0 to AsyncLLM by @wuxibin89 in [rollout] fix: set repetition_penalty=1.0 to AsyncLLM #1949
[fsdp] feat: Memory efficient cross entropy with a linear layer fused by @Jianbing-D in [fsdp] feat: Memory efficient cross entropy with a linear layer fused #462
[recipe] feat: qwen2.5vl 7b report and guide by @ISEEKYAN in [recipe] feat: qwen2.5vl 7b report and guide #1969
[ckpt] refactor: enhance FSDP checkpoint manager flexibility by @0x404 in [ckpt] refactor: enhance FSDP checkpoint manager flexibility #1350
[env] fix: npu ray verion to 2.46.0 for CI problem by @wyz649296016 in [env] fix: npu ray verion to 2.46.0 for CI problem #1987
Fix TypeError by Removing Duplicate Arguments in run_deepseek671b_math_megatron.sh by @none0663 in Fix TypeError by Removing Duplicate Arguments in run_deepseek671b_math_megatron.sh #1996
[megatron] feat: Config NCCL Timeout for Megatron Backend Model Loading by @none0663 in [megatron] feat: Config NCCL Timeout for Megatron Backend Model Loading #1983
[tests] chore: ppo workflow runs on volcengine machine learning platform by @htc070011 in [tests] chore: ppo workflow runs on volcengine machine learning platform #1979
[megatron] fix: multiple key error when trying to override megatron tr… by @donpromax in [megatron] fix: multiple key error when trying to override megatron tr… #1990
[megatron] feat: robust and efficient mcore converter with meta device init and numel check for dpsk by @Yangruipis in [megatron] feat: robust and efficient mcore converter with meta device init and numel check for dpsk #1995
Stabilize loss calculations by clamping KL divergence values by @syo093c in Stabilize loss calculations by clamping KL divergence values #1779
[ckpt] fix: run converter_hf_to_mcore with --test will raise an AttributeError by @lxg2015 in [ckpt] fix: run converter_hf_to_mcore with --test will raise an AttributeError #2010
[algo] fix: vf_loss factor by @tongyx361 in [algo] fix: vf_loss factor #2016
[data] fix: fix retool sft data source by @vermouth1992 in [data] fix: fix retool sft data source #2018
[fsdp] fix: position_ids in qwen-vl by @ShareLer in [fsdp] fix: position_ids in qwen-vl #1947
[hardware] refactor: refactor part of device management by @FightingZhen in [hardware] refactor: refactor part of device management #1974
[trainer] fix: fix sft max_position_embeddings by @vermouth1992 in [trainer] fix: fix sft max_position_embeddings #2019
[misc] fix: fix format by @vermouth1992 in [misc] fix: fix format #2023
[megatron] fix: dpskv3 convert src and dst mixed up bug by @Yangruipis in [megatron] fix: dpskv3 convert src and dst mixed up bug #2029
fix: TensorDict usage error by @zhihe-wang in fix: TensorDict usage error #2046
[hardware] feat: support qwen2_5_vl on ASCEND NPU by @as12138 in [hardware] feat: support qwen2_5_vl on ASCEND NPU #1924
[trainer] chore: Reducing the number of calls to the write by @RuixiangMa in [trainer] chore: Reducing the number of calls to the write #2043
[Bug] fix None check in DataProto print_size() by @GHGmc2 in [Bug] fix None check in DataProto print_size() #2067
[perf] feat: Add verl profiling support from Nvidia Nsight System by @davidmlw in [perf] feat: Add verl profiling support from Nvidia Nsight System #1820
[data] fix: multimodal overlong prompt length filtering by @dirtyDan0 in [data] fix: multimodal overlong prompt length filtering #2063
[sglang] fix: AsyncSglangServer use async wake_up/sleep by @feifeibear in [sglang] fix: AsyncSglangServer use async wake_up/sleep #2062
[training_utils] feat: Add project and experiment name to tensorboard log path by @Geaming2002 in [training_utils] feat: Add project and experiment name to tensorboard log path #2080
[trainer] fix: Fix trainer config for val_only by @hscspring in https://github.com/volcengine/verl/pull/20842083
[megatron] fix: fix qwen2_vl on plain-text data and mix data of plain-text and image-text by @MaoChouHJM in [megatron] fix: fix qwen2_vl on plain-text data and mix data of plain-text and image-text #1999
[vllm] fix: mv disable_mm_preprocessor_cache to vllm engine_kwargs by @yyDing1 in [vllm] fix: mv disable_mm_preprocessor_cache to vllm engine_kwargs #2068
[misc] feat: update instruction for running dapo on qwen2.5 7b math and add reference wandb by @vermouth1992 in [misc] feat: update instruction for running dapo on qwen2.5 7b math and add reference wandb #2094
[rollout] refactor: Add option for rollout_log_probs, and default as False by @GHGmc2 in [rollout] refactor: Add option for rollout_log_probs, and default as False #2072
[tool] feat: Add Search Tool implemented with MCP by @AlecHenx in [tool] feat: Add Search Tool implemented with MCP #1948
[trainer] fix: make reward_extra_info optional in reward_result by @HollowMan6 in [trainer] fix: make reward_extra_info optional in reward_result #2109
[algo] feat: integrate Clip-Cov and KL-Cov methods by @Raf-Chen in [algo] feat: integrate Clip-Cov and KL-Cov methods #1830
[rollout] fix: error in sgyang async mode by @chenhaiq in [rollout] fix: error in sgyang async mode #2098
[rollout] fix: fix rollout key not found by @ETOgaosion in [rollout] fix: fix rollout key not found #2116
[recipe] feat: Move entropy reward to the entropy recipe by @Raf-Chen in [recipe] feat: Move entropy reward to the entropy recipe #2118
[cfg, perf] refactor: add omega_conf_to_dataclass API, rename WorkerProfiler to DistProfiler, add unit test based on ProfilerConfig by @eric-haibin-lin in [cfg, perf] refactor: add omega_conf_to_dataclass API, rename WorkerProfiler to DistProfiler, add unit test based on ProfilerConfig #2117
[worker] feat: add support for dynamic batch size of multimodal data by @wang-zerui in [worker] feat: add support for dynamic batch size of multimodal data #2049
[fsdp] refactor: set actor's strategy as default for critic and ref by @0x404 in [fsdp] refactor: set actor's strategy as default for critic and ref #2130
[ray] feat: add a test to demonstrate how to perform p2p communication inside wor… by @vermouth1992 in [ray] feat: add a test to demonstrate how to perform p2p communication inside wor… #2131
[sglang] feat: Support async multi-turn rollout with simulation feedback in sglang by @kinza99 in [sglang] feat: Support async multi-turn rollout with simulation feedback in sglang #1630
[tool] feat: Add memory limit configuration for sandbox fusion by @plutoZZZZ in [tool] feat: Add memory limit configuration for sandbox fusion #2105
[sglang] feat: add multimodal input to multiturn async rollout by @nanjiangwill in [sglang] feat: add multimodal input to multiturn async rollout #2014
[fsdp] feat: support fsdp2 save hugging face model by @0x404 in [fsdp] feat: support fsdp2 save hugging face model #2138
[rollout]fix: vllm_rollout_spmd.py when return_raw_chat=True by @zyfzjsc988 in [rollout]fix: vllm_rollout_spmd.py when return_raw_chat=True #2156
[rollout] feat: Support Multi-stage Awake for SGLang by @hebiao064 in [rollout] feat: Support Multi-stage Awake for SGLang #1911
[worker] feat: allow dist shared file-system initialization by @Cccei000 in [worker] feat: allow dist shared file-system initialization #2154
[model] feat: Add MiniCPM-o 2.6 support by @RanchiZhao in [model] feat: Add MiniCPM-o 2.6 support #1833
[model] fix: Revert "[model] feat: Add MiniCPM-o 2.6 support" by @hiyouga in [model] fix: Revert "[model] feat: Add MiniCPM-o 2.6 support" #2176
[misc] fix: fix timer importance error in split_placement by @FightingZhen in [misc] fix: fix timer importance error in split_placement #2169
[megatron,vllm] fix: megatron vllm async rollout server by @Yangruipis in [megatron,vllm] fix: megatron vllm async rollout server #2122
[model] feat: Add MiniCPM-o 2.6 support by @hiyouga in [model] feat: Add MiniCPM-o 2.6 support #2178
[megatron] feat: Support of dist checkpoint by @ETOgaosion in [megatron] feat: Support of dist checkpoint #2125
[data] fix: fix the type of parquet_files in SFTDataset by @xuuHuang in [data] fix: fix the type of parquet_files in SFTDataset #2203
[trainer] fix: add missing qwen2_moe flops counter by @ETOgaosion in [trainer] fix: add missing qwen2_moe flops counter #2190
[trainer] fix: Add init.py to verl.trainer.config by @ultmaster in [trainer] fix: Add __init__.py to verl.trainer.config #2214
[model] fix: make vlm patch forward compatible by @hiyouga in [model] fix: make vlm patch forward compatible #2215
[recipe] fix: parameter order in RayPRIMETrainer super().init() call by @xxnpark in [recipe] fix: parameter order in RayPRIMETrainer super().__init__() call #2172
[misc] feat: support ValidationGenerationsLogger in vemlp_wandb by @chenhaiq in [misc] feat: support ValidationGenerationsLogger in vemlp_wandb #2191

New Contributors

Thank you all for joining this project!

@vickytsang @davidjsonn @rocke2020 @vwxyzjn @Yangruipis @SeungyounShin @donpromax @leopardracer @ZhiyuLi-Nvidia @LiyuanLucasLiu @Jianbing-D @wyz649296016 @htc070011 @syo093c @FightingZhen @zhihe-wang @KaiChen1998 @wizeng23 @RuixiangMa @davidmlw @feifeibear @hscspring @MaoChouHJM @AlecHenx @wang-zerui @kinza99 @nanjiangwill @zyfzjsc988 @Cccei000 @RanchiZhao @xuuHuang @ultmaster @xxnpark @jvmncs @xingyunjohn1

Full Changelog: v0.4.0...v0.4.1

This discussion was created from the release v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations #2225

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations #2225

Uh oh!

eric-haibin-lin Jun 27, 2025 Maintainer

v0.4.1 patch release: checkpoint fixes for MoE EP & LoRA, OpenAI/MCP tool calling schema, and SGLang memory optimizations

Key changes

What's Changed

New Contributors

Replies: 0 comments

eric-haibin-lin
Jun 27, 2025
Maintainer