Enable RTX Pro 6000 Blackwell runners for CI/CD by kevalmorabia97 · Pull Request #944 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-02-27T10:38:11Z

What does this PR do?

Type of change: CI/CD Improvement

Updated CI/CD test matrix (new cuda13-gpu-trtllm dedicated job for gpu tests on trtllm container)

Workflow	Trigger	Test Matrix	GPU Runner
GPU tests	PR	`cuda13-gpu`, `cuda13-gpu-megatron`, `cuda13-gpu-trtllm`	1x RTX Pro 6000
GPU tests	Nightly	`cuda13-gpu`, `cuda13-gpu-megatron`, `cuda13-gpu-trtllm`	2x RTX Pro 6000
Example tests (torch)	PR	`llm_distill`, `llm_qat`, `llm_sparsity`, `speculative_decoding`	1x H100
Example tests (torch)	Nightly	`llm_distill`, `llm_qat`, `llm_sparsity`, `speculative_decoding`	2x RTX Pro 6000
Example tests (trtllm)	PR	`llm_ptq`, `vlm_ptq`	1x RTX Pro 6000
Example tests (trtllm)	Nightly	`llm_autodeploy`, `llm_eval`, `llm_ptq`, `vlm_ptq`	2x RTX Pro 6000
Example tests (onnx)	PR	`diffusers`, `torch_onnx`	1x L4
Example tests (onnx)	Nightly	`diffusers`, `torch_onnx`	2x RTX Pro 6000

Testing

Per PR tests pass in this PR
Nightly tests manually triggered: https://github.com/NVIDIA/Model-Optimizer/actions/runs/22495679199 (GPU tests), ? (Example tests)

Summary by CodeRabbit

Chores
- Unified and simplified CI test matrices and reduced duplicate workflow configuration.
- Updated GPU runner targets and container images for test jobs to newer GPU types.
Tests
- Added new GPU test variants and several new test modules.
- Simplified test gating: megatron auto-skip removed; tests now use a single dependency check for the mamba provider.
Tooling
- Added a new tox environment for an additional GPU test suite.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

coderabbitai · 2026-02-27T10:38:32Z

📝 Walkthrough

Walkthrough

The pull request refactors two GitHub Actions workflow files to introduce shared YAML anchors for GPU test strategies and consolidate runner configurations. Updates include adopting reusable matrix definitions, changing runner images from L4/H100 variants to RTX Pro 6000 variants, and adding container environment variables to GPU test jobs.

Changes

Cohort / File(s)	Summary
Example Tests Workflow `.github/workflows/example_tests.yml`	Introduces shared anchors (`&torch_strategy`, `&onnx_strategy`) for GPU test matrices to reduce duplication. Updates runner images: L4 → H100 for torch jobs; H100 → RTX Pro 6000 for ONNX and TensorRT-LLM jobs. TensorRT-LLM pr job matrix expanded to include `vlm_ptq` alongside `llm_ptq`. Multiple jobs refactored to reuse anchors via `*anchor` syntax.
GPU Tests Workflow `.github/workflows/gpu_tests.yml`	Introduces shared `&gpu_strategy` anchor applied to both gpu-tests-pr and gpu-tests-non-pr jobs. Runner updated from L4 to RTX Pro 6000 for pr job, H100 variant to RTX Pro 6000 for non-pr. Adds container environment variables (`GIT_DEPTH`, `PIP_CONSTRAINT`, `HF_TOKEN`) to pr job. Matrix definitions consolidated under shared anchor.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: enabling RTX Pro 6000 Blackwell runners for CI/CD, which aligns with the primary modifications shown in the changeset across GPU and example test workflows.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kmorabia/rtxpro-cicd

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

.github/workflows/gpu_tests.yml (1)
70-70: Custom runner labels are valid for self-hosted runners — consider adding actionlint config.

The static analysis warnings about unknown runner labels (linux-amd64-gpu-rtxpro6000-latest-1, linux-amd64-gpu-rtxpro6000-latest-2) are false positives. These are custom labels for NVIDIA's self-hosted GPU runners.

To suppress these warnings in future CI runs, consider adding an .github/actionlint.yaml config file:
self-hosted-runner:
  labels:
    - linux-amd64-gpu-rtxpro6000-latest-1
    - linux-amd64-gpu-rtxpro6000-latest-2
    - linux-amd64-gpu-h100-latest-1
    - linux-amd64-gpu-l4-latest-1
Also applies to: 89-89
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/gpu_tests.yml at line 70, Add an actionlint configuration
file that declares the custom self-hosted runner labels used by the workflow
(the unknown labels are: linux-amd64-gpu-rtxpro6000-latest-1,
linux-amd64-gpu-rtxpro6000-latest-2, linux-amd64-gpu-h100-latest-1,
linux-amd64-gpu-l4-latest-1) so actionlint stops flagging them as invalid;
create or update the actionlint config named actionlint.yaml with a top-level
self-hosted-runner.labels array containing those label strings (or merge into
the existing actionlint config if present) to suppress the false positive
warnings for the gpu_tests.yml workflow.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/gpu_tests.yml:
- Line 70: Add an actionlint configuration file that declares the custom
self-hosted runner labels used by the workflow (the unknown labels are:
linux-amd64-gpu-rtxpro6000-latest-1, linux-amd64-gpu-rtxpro6000-latest-2,
linux-amd64-gpu-h100-latest-1, linux-amd64-gpu-l4-latest-1) so actionlint stops
flagging them as invalid; create or update the actionlint config named
actionlint.yaml with a top-level self-hosted-runner.labels array containing
those label strings (or merge into the existing actionlint config if present) to
suppress the false positive warnings for the gpu_tests.yml workflow.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35e6099 and c9ebe8c.

📒 Files selected for processing (2)

.github/workflows/example_tests.yml
.github/workflows/gpu_tests.yml

codecov · 2026-02-27T10:50:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.16%. Comparing base (35e6099) to head (0fb53c9).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #944      +/-   ##
==========================================
+ Coverage   72.15%   72.16%   +0.01%     
==========================================
  Files         210      210              
  Lines       23515    23515              
==========================================
+ Hits        16967    16970       +3     
+ Misses       6548     6545       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/gpu/torch/quantization/test_hadamard.py`:
- Around line 23-28: The module-level probe currently catches all exceptions;
change it to only skip for CUDA-unavailability errors by catching explicit
exception types (e.g., RuntimeError and torch.cuda.CudaError if available)
around the fast_hadamard_transform.hadamard_transform(torch.randn(1, 2,
device="cuda")) call, call pytest.skip(...) only for those exceptions, and
re-raise any other exceptions so real failures surface; reference the
fast_hadamard_transform.hadamard_transform call and pytest.skip usage when
making this change.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9ebe8c and 6a09835.

📒 Files selected for processing (17)

.github/workflows/_example_tests_runner.yml
.github/workflows/example_tests.yml
.github/workflows/gpu_tests.yml
tests/_test_utils/import_helper.py
tests/_test_utils/torch/megatron/models.py
tests/_test_utils/torch/megatron/utils.py
tests/gpu/torch/quantization/test_hadamard.py
tests/gpu_megatron/_extensions
tests/gpu_megatron/_extensions/test_torch_extensions.py
tests/gpu_megatron/torch/nas/plugins/test_megatron_mamba_dynamic_modules.py
tests/gpu_megatron/torch/prune/plugins/test_mcore_mamba_minitron_pruning.py
tests/gpu_trtllm/_extensions/test_torch_extensions.py
tests/gpu_trtllm/torch/quantization/backends/test_fp8_per_tensor_gemm.py
tests/gpu_trtllm/torch/quantization/backends/test_gemm_common.py
tests/gpu_trtllm/torch/quantization/backends/test_gemm_registry.py
tests/gpu_trtllm/torch/quantization/backends/test_nvfp4_gemm.py
tox.ini

💤 Files with no reviewable changes (2)

tests/gpu_megatron/_extensions
tests/_test_utils/torch/megatron/utils.py

✅ Files skipped from review due to trivial changes (1)

tests/gpu_trtllm/_extensions/test_torch_extensions.py

tests/gpu/torch/quantization/test_hadamard.py

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

Enable RTX Pro 6000 Blackwell runners for CI/CD

c9ebe8c

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 requested review from ChenhanYu and cjluo-nv February 27, 2026 10:38

kevalmorabia97 requested a review from a team as a code owner February 27, 2026 10:38

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

kevalmorabia97 force-pushed the kmorabia/rtxpro-cicd branch 3 times, most recently from 55956af to 6a09835 Compare February 27, 2026 13:05

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

tests/gpu/torch/quantization/test_hadamard.py Show resolved Hide resolved

Add trtllm specific gpu test job

0fb53c9

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabia/rtxpro-cicd branch from c5b4c21 to 0fb53c9 Compare February 27, 2026 17:37

kevalmorabia97 requested a review from realAsma February 27, 2026 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable RTX Pro 6000 Blackwell runners for CI/CD#944

Enable RTX Pro 6000 Blackwell runners for CI/CD#944
kevalmorabia97 wants to merge 2 commits intomainfrom
kmorabia/rtxpro-cicd

kevalmorabia97 commented Feb 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kevalmorabia97 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kevalmorabia97 commented Feb 27, 2026 •

edited

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

codecov bot commented Feb 27, 2026 •

edited

Loading