Skip to content

Commit dc225e1

Browse files
SkychenLeel00832868
andauthored
[v0.13.0][Lora][BugFix] Fix crash on base model requests with LoRA enabled (#6457)
problem: when use lora in compile mode, request lora module is successful,but request base model will be failed and model will be core down. ### What this PR does / why we need it? when we start a model with lora , request lora is ok, but requesting base model will case the model process core down【dangerous problem】 Related-issues: #6279 ### Does this PR introduce _any_ user-facing change? not introduce_any_user-facing change ### How was this patch tested? vLLM version: v0.13.0 rc2 Signed-off-by: l00832868 <[email protected]> Co-authored-by: l00832868 <[email protected]>
1 parent cb1212f commit dc225e1

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

vllm_ascend/worker/model_runner_v1.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2131,9 +2131,8 @@ def _dummy_run(
21312131
if self.is_kv_producer and not self.is_kv_consumer:
21322132
with_prefill = True
21332133

2134-
has_lora = True if self.lora_config and self.compilation_config.cudagraph_specialize_lora else False
21352134
_ag_mode, batch_descriptor = \
2136-
self.cudagraph_dispatcher.dispatch(num_tokens=num_tokens, uniform_decode=uniform_decode, has_lora=has_lora)
2135+
self.cudagraph_dispatcher.dispatch(num_tokens=num_tokens, uniform_decode=uniform_decode, has_lora=activate_lora)
21372136

21382137
# Padding for DP
21392138
(num_tokens, num_tokens_across_dp, with_prefill,
@@ -2189,7 +2188,7 @@ def _dummy_run(
21892188
_ag_mode, batch_descriptor = self.cudagraph_dispatcher.dispatch(
21902189
num_tokens=num_tokens,
21912190
uniform_decode=uniform_decode,
2192-
has_lora=has_lora,
2191+
has_lora=activate_lora,
21932192
disable_full=synced_cudagraph_mode
21942193
<= CUDAGraphMode.PIECEWISE.value)
21952194

0 commit comments

Comments
 (0)