Skip to content

Commit f4a72f0

Browse files
authored
[CI]Disable early exit to complete all tests (#6482)
### What this PR does / why we need it? 1. Disable the feature to exit early upon encountering an error in order to complete all tests. 2. Within each partition, tests are re-sorted by `estimated_time` in ascending order. This allows the CI to cover as many test cases as possible in the early stages. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.14.1 - vLLM main: vllm-project/vllm@dc917cc --------- Signed-off-by: MrZ20 <2609716663@qq.com>
1 parent dffac6d commit f4a72f0

File tree

3 files changed

+10
-2
lines changed

3 files changed

+10
-2
lines changed

.github/workflows/READMD.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,10 @@ To speed up CI execution, we support splitting large test suites into multiple p
4747
The partitioning algorithm uses a Greedy Approach to achieve load balancing, aiming to make the total estimated runtime of each partition as equal as possible.
4848

4949
1. **Read Configuration**: The script reads all non-skipped test cases and their `estimated_time` from `config.yaml`.
50-
2. **Sort**: Test cases are sorted by `estimated_time` in descending order.
50+
2. **Sort(Balanced Assignment)**: Test cases are sorted by `estimated_time` in descending order. This ensures that the heaviest tasks are distributed first to achieve optimal load balancing across partitions.
5151
3. **Assign**: Iterating through the sorted test cases, each case is assigned to the partition (Bucket) with the current minimum total time.
52+
4. **Re-sort (Fast Feedback)**: Within each partition, tests are re-sorted by `estimated_time` in ascending order. This allows the CI to cover as many test cases as possible in the early stages.
53+
> TIP: If you need to prioritize a new test case, you can temporarily set its estimated_time to 0 to ensure it runs first, then update it to the actual value later.
5254

5355
### How to Modify Partitioning Logic
5456

.github/workflows/_e2e_test.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ jobs:
2525
if: ${{ inputs.type == 'light' }}
2626
runs-on: linux-aarch64-a2b3-1
2727
strategy:
28+
fail-fast: false
2829
matrix:
2930
part: [0]
3031
container:
@@ -89,6 +90,7 @@ jobs:
8990
if: ${{ inputs.type == 'full' }}
9091
runs-on: linux-aarch64-a2b3-1
9192
strategy:
93+
fail-fast: false
9294
matrix:
9395
part: [0, 1]
9496
container:
@@ -153,6 +155,7 @@ jobs:
153155
if: ${{ inputs.type == 'light' }}
154156
runs-on: linux-aarch64-a3-2
155157
strategy:
158+
fail-fast: false
156159
matrix:
157160
part: [0]
158161
container:
@@ -216,6 +219,7 @@ jobs:
216219
if: ${{ inputs.type == 'full' }}
217220
runs-on: linux-aarch64-a3-2
218221
strategy:
222+
fail-fast: false
219223
matrix:
220224
part: [0]
221225
container:
@@ -287,6 +291,7 @@ jobs:
287291
if: ${{ inputs.type == 'full' }}
288292
runs-on: linux-aarch64-a3-4
289293
strategy:
294+
fail-fast: false
290295
matrix:
291296
part: [0]
292297
container:

.github/workflows/scripts/run_suite.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ def auto_partition(files, rank, size):
7474

7575
# Return the files corresponding to the indices in the specified rank's partition
7676
indices = partitions[rank]
77+
indices.sort(key=lambda i: files[i].estimated_time)
7778
return [files[i] for i in indices]
7879

7980

@@ -189,7 +190,7 @@ def main():
189190
arg_parser.add_argument(
190191
"--continue-on-error",
191192
action="store_true",
192-
default=False,
193+
default=True,
193194
help="Continue running remaining tests even if one fails (useful for nightly tests)",
194195
)
195196
args = arg_parser.parse_args()

0 commit comments

Comments
 (0)