Skip to content

Commit 922e5c1

Browse files
Meihan-chenwangxiyuanzhangxinyuehfad
authored
[main2main] upgrade vllm main 0202 (#6560)
### What this PR does / why we need it? 1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required positional argument: 'is_sequence_parallel'` due to vllm-project/vllm#32567 2. Fix ` TypeError: '>' not supported between instances of 'MagicMock' and 'int'` due to vllm-project/vllm#33035 3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with abstract methods forward_mha, forward_mqa` and AttributeError: 'bool' object has no attribute 'process_weights_after_loading' due to vllm-project/vllm#33284 4. Fix `'AscendSharedFusedMoE' object has no attribute '_routed_input_transform'`due to vllm-project/vllm#32790 5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument 'num_active_loras'` due to vllm-project/vllm#32005 6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'` due to vllm-project/vllm#27492 7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward, vllm.moe_forward_shared due to vllm-project/vllm#33184 8. Add patch to fix the problem "got multiple values for keyword argument 'add_special_tokens'" due to vllm-project/vllm#32863 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
1 parent 2c16082 commit 922e5c1

File tree

28 files changed

+246
-30
lines changed

28 files changed

+246
-30
lines changed

.github/workflows/_pre_commit.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ jobs:
3838
repository: vllm-project/vllm
3939
path: ./vllm-empty
4040
ref: ${{ inputs.vllm }}
41+
4142
- uses: dorny/paths-filter@v3
4243
id: filter
4344
with:
@@ -62,10 +63,12 @@ jobs:
6263
run: |
6364
git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
6465
pre-commit run --all-files --hook-stage manual --show-diff-on-failure
66+
6567
- name: Run mypy
6668
run: |
6769
PYTHONPATH="$PYTHONPATH:$(pwd)/vllm-empty"
6870
export PYTHONPATH
71+
env
6972
git config --global --add safe.directory /__w/vllm-ascend/vllm-ascend
7073
# Run mypy for Python 3.10, 3.11, 3.12 manually
7174
# Note: We are now separating mypy from pre-commit hooks for performance reasons.

.github/workflows/bot_pr_create.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ jobs:
3737
steps:
3838
- name: Get vLLM version
3939
run: |
40-
VLLM_COMMIT=v0.15.0
40+
VLLM_COMMIT=d7e17aaacd5ed1b4b4be6bcfef3a1b7cbc84fc9a
4141
echo "VLLM_COMMIT=https://github.com/vllm-project/vllm/commit/$VLLM_COMMIT" >> "$GITHUB_ENV"
4242
4343
- name: Checkout repository

.github/workflows/dockerfiles/Dockerfile.lint

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ RUN apt-get update -y && \
2727

2828
ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
2929
# For lint purpose, actually we need make a main2main matching.
30-
ARG VLLM_COMMIT=v0.15.0
30+
ARG VLLM_COMMIT=d7e17aaacd5ed1b4b4be6bcfef3a1b7cbc84fc9a
3131
RUN git clone $VLLM_REPO /vllm-workspace/vllm && \
3232
cd /vllm-workspace/vllm && \
3333
git checkout $VLLM_COMMIT

.github/workflows/pr_test_full.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ jobs:
7575
name: e2e-full
7676
strategy:
7777
matrix:
78-
vllm_version: [v0.15.0]
78+
vllm_version: [d7e17aaacd5ed1b4b4be6bcfef3a1b7cbc84fc9a, v0.15.0]
7979
needs: [changes]
8080
if: ${{ needs.changes.outputs.e2e_tracker == 'true' || needs.changes.outputs.e2e_tracker == true }}
8181
uses: ./.github/workflows/_e2e_test.yaml

.github/workflows/pr_test_light.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ jobs:
4141
lint:
4242
uses: ./.github/workflows/_pre_commit.yml
4343
with:
44-
vllm: v0.15.0
44+
vllm: d7e17aaacd5ed1b4b4be6bcfef3a1b7cbc84fc9a
4545
changes:
4646
runs-on: linux-aarch64-a2-0
4747
outputs:
@@ -87,7 +87,7 @@ jobs:
8787
if: ${{ needs.lint.result == 'success' && (needs.changes.outputs.e2e_tracker == 'true' || needs.changes.outputs.ut_tracker == 'true') }}
8888
strategy:
8989
matrix:
90-
vllm_version: [v0.15.0]
90+
vllm_version: [d7e17aaacd5ed1b4b4be6bcfef3a1b7cbc84fc9a, v0.15.0]
9191
uses: ./.github/workflows/_unit_test.yaml
9292
with:
9393
vllm: ${{ matrix.vllm_version }}
@@ -99,7 +99,7 @@ jobs:
9999
name: e2e-light
100100
strategy:
101101
matrix:
102-
vllm_version: [v0.15.0]
102+
vllm_version: [d7e17aaacd5ed1b4b4be6bcfef3a1b7cbc84fc9a, v0.15.0]
103103
# Note (yikun): If CI resource are limited we can split job into two chain jobs
104104
needs: [lint, changes]
105105
# only trigger e2e test after lint passed and the change is e2e related with pull request.

.github/workflows/schedule_codecov_refresh.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ jobs:
3333
name: refresh codecov
3434
strategy:
3535
matrix:
36-
vllm_version: [v0.15.0]
36+
vllm_version: [d7e17aaacd5ed1b4b4be6bcfef3a1b7cbc84fc9a]
3737
uses: ./.github/workflows/_unit_test.yaml
3838
with:
3939
vllm: ${{ matrix.vllm_version }}

docs/source/community/versioning_policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ For main branch of vLLM Ascend, we usually make it compatible with the latest vL
5555

5656
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
5757
|-------------|--------------|------------------|-------------|--------------------|
58-
| main | v0.15.0 tag | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 |
58+
| main | d7e17aaacd5ed1b4b4be6bcfef3a1b7cbc84fc9a, v0.15.0 tag | >= 3.10, < 3.12 | 8.5.0 | 2.9.0 / 2.9.0 |
5959

6060
## Release cadence
6161

tests/e2e/conftest.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -922,4 +922,7 @@ def hunyuan_prompt(questions: list[str]) -> list[str]:
922922

923923
@pytest.fixture(params=PROMPT_CONFIGS.keys())
924924
def vl_config(request):
925-
return PROMPT_CONFIGS[request.param]
925+
config = PROMPT_CONFIGS[request.param]
926+
if "skip" in config:
927+
pytest.skip(config["skip"])
928+
return config

tests/ut/eplb/core/test_eplb_utils.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
from vllm_ascend.ascend_config import init_ascend_config
1111
from vllm_ascend.eplb.core.eplb_utils import init_eplb_config
12+
from vllm_ascend.utils import vllm_version_is
1213
# isort: on
1314

1415

@@ -21,7 +22,13 @@ def setUp(self, mock_fix_incompatible_config):
2122
"eplb_config": {"dynamic_eplb": True, "num_redundant_experts": 2},
2223
}
2324
from vllm.model_executor.layers.fused_moe.config import RoutingMethodType
24-
moe_parallel_config = FusedMoEParallelConfig(2, 0, 1, 2, 1, 1, 1, 1, True, "hccl", enable_eplb=True)
25+
if vllm_version_is("0.15.0"):
26+
moe_parallel_config = FusedMoEParallelConfig(
27+
2, 0, 1, 2, 1, 1, 1, 1, True, "hccl", enable_eplb=True)
28+
else:
29+
moe_parallel_config = FusedMoEParallelConfig(
30+
2, 0, 1, 2, 1, 1, 1, 1, True, "hccl",
31+
is_sequence_parallel=False, enable_eplb=True)
2532
moe_config = FusedMoEConfig(
2633
num_experts=8,
2734
experts_per_token=8,

tests/ut/ops/test_mla.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,13 @@ def setUp(self):
8282
@patch("vllm_ascend.ops.mla.get_tensor_model_parallel_world_size")
8383
def test_initialization(self, mock_tp_size, mock_ascend_config,
8484
mock_get_vllm_config):
85+
# Create a proper mock for MLAAttention that has the required attributes
86+
mock_mla_attn = MagicMock()
87+
mock_mla_attn.process_weights_after_loading = MagicMock()
88+
mock_mla_attn.impl = MagicMock()
89+
mock_mla_attn.impl.process_weights_after_loading = MagicMock()
8590

86-
with patch("vllm_ascend.ops.mla.MLAAttention", return_value=True):
91+
with patch("vllm_ascend.ops.mla.MLAAttention", return_value=mock_mla_attn):
8792
mock_tp_size.return_value = 2
8893
mock_ascend_config.return_value.enable_shared_expert_dp = True
8994
mock_vllm_config = MagicMock(spec=VllmConfig)
@@ -126,7 +131,14 @@ def test_forward(self, mock_get_forward_context, mock_tp_size,
126131
num_hidden_layers=32, first_k_dense_replace=False)
127132
mock_get_vllm_config.return_value = mock_vllm_config
128133
mock_vllm_config.compilation_config = CompilationConfig()
129-
with patch("vllm_ascend.ops.mla.MLAAttention", return_value=True):
134+
135+
# Create a proper mock for MLAAttention that has the required attributes
136+
mock_mla_attn = MagicMock()
137+
mock_mla_attn.process_weights_after_loading = MagicMock()
138+
mock_mla_attn.impl = MagicMock()
139+
mock_mla_attn.impl.process_weights_after_loading = MagicMock()
140+
141+
with patch("vllm_ascend.ops.mla.MLAAttention", return_value=mock_mla_attn):
130142
attn = AscendMultiHeadLatentAttention(
131143
hidden_size=self.hidden_size,
132144
num_heads=self.num_heads,

0 commit comments

Comments
 (0)