Updated the diffusion config issue and more test cases#937
Updated the diffusion config issue and more test cases#937
Conversation
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughConfiguration axis defaults updated from 0 to None for INT8 quantizers, with Conv2d input quantizers now receiving PercentileCalibrator settings. New test suite validates Hugging Face checkpoint export for Diffusers models across multiple quantization formats. Existing quantization unit tests extended with parameterized configuration coverage. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/torch/export/test_export_diffusers.py (1)
112-117: Reasonable skip for tiny model incompatibility, but consider usingpytest.skipmessage consistency.The exception handling gracefully handles the case where tiny test models have weights incompatible with FP4 block quantization. However, catching a broad
AssertionErrorand checking for a string pattern is fragile.Consider whether the underlying code could raise a more specific exception type (e.g.,
ValueErroror a customBlockSizeError) to make this check more robust. As-is, if the assertion message text changes, this skip logic will break silently.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/torch/export/test_export_diffusers.py` around lines 112 - 117, The test currently catches a broad AssertionError from export_hf_checkpoint and inspects the message text, which is fragile; update the code so export_hf_checkpoint (or the helper that validates quantization block size) raises a specific exception (e.g., ValueError or a new BlockSizeError) when tiny weights are incompatible with FP4 block quantization, and then change the test to catch that specific exception (check for BlockSizeError or ValueError) and call pytest.skip with the same message for config_id == "fp4"; reference the export_hf_checkpoint function and the validation that detects "block size" incompatibility when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/unit/torch/export/test_export_diffusers.py`:
- Around line 112-117: The test currently catches a broad AssertionError from
export_hf_checkpoint and inspects the message text, which is fragile; update the
code so export_hf_checkpoint (or the helper that validates quantization block
size) raises a specific exception (e.g., ValueError or a new BlockSizeError)
when tiny weights are incompatible with FP4 block quantization, and then change
the test to catch that specific exception (check for BlockSizeError or
ValueError) and call pytest.skip with the same message for config_id == "fp4";
reference the export_hf_checkpoint function and the validation that detects
"block size" incompatibility when making the change.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/diffusers/quantization/config.pytests/gpu/torch/export/test_export_diffusers_hf_ckpt.pytests/unit/torch/export/test_export_diffusers.py
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #937 +/- ##
==========================================
- Coverage 72.16% 72.15% -0.01%
==========================================
Files 210 210
Lines 23522 23522
==========================================
- Hits 16974 16973 -1
- Misses 6548 6549 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| assert "quantization_config" in config_data | ||
|
|
||
|
|
||
| @pytest.mark.skipif(not torch.cuda.is_available(), reason="FP4 export requires NVIDIA GPU") |
There was a problem hiding this comment.
If a test needs GPU, it should be moved to tests/gpu/torch
There was a problem hiding this comment.
This should go to tests/examples/diffusers
|
|
||
|
|
||
| @pytest.mark.parametrize( | ||
| "model", |
There was a problem hiding this comment.
should we cover Wan2.2 and LTX-2?
There was a problem hiding this comment.
Wan2.2 and LTX-2 is too big for CI, lets see if we have other options
There was a problem hiding this comment.
Pull request overview
This pull request fixes an INT8 quantization configuration issue in the diffusers quantization examples and adds comprehensive test coverage for the HuggingFace checkpoint export functionality. The changes address a bug in the INT8 config and ensure that the --hf-ckpt-dir export path, which previously had zero test coverage, is now properly tested across multiple quantization formats and models.
Changes:
- Simplified INT8 quantization configuration by updating axis settings and streamlining the
reset_set_int8_configfunction to only add PercentileCalibrator to Conv2d layers - Extended unit tests to cover INT8, INT8 SmoothQuant, FP8, and FP4 quantization formats across three diffusion model types (UNet, DiT, Flux)
- Added GPU integration tests for end-to-end quantization and HF checkpoint export with SDXL and Flux models using INT8 SmoothQuant, FP8, and FP4 formats
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| examples/diffusers/quantization/config.py | Updated INT8_DEFAULT_CONFIG axis settings and simplified reset_set_int8_config to only handle Conv2d calibrators |
| tests/unit/torch/export/test_export_diffusers.py | Extended parametrization to test multiple quantization configs (INT8, INT8 SmoothQuant, FP8) and added separate FP4 test |
| tests/gpu/torch/export/test_export_diffusers_hf_ckpt.py | New GPU integration test file with 4 test scenarios covering INT8 SmoothQuant, FP8, and FP4 with SDXL and Flux models |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "quant_cfg": { | ||
| "*weight_quantizer": {"num_bits": 8, "axis": 0}, | ||
| "*input_quantizer": {"num_bits": 8, "axis": 0}, | ||
| "*weight_quantizer": {"num_bits": 8, "axis": None}, |
There was a problem hiding this comment.
The weight_quantizer axis should be 0 (per-channel quantization), not None (per-tensor quantization), to match the standard modelopt INT8_DEFAULT_CFG configuration. The modelopt INT8_DEFAULT_CFG uses axis=0 for weight quantization and axis=None for input quantization. This inconsistency could lead to suboptimal quantization quality for weights.
| "*weight_quantizer": {"num_bits": 8, "axis": None}, | |
| "*weight_quantizer": {"num_bits": 8, "axis": 0}, |
| collect_method="default", | ||
| model_dtype="BFloat16", | ||
| ), | ||
| marks=minimum_sm(89), |
There was a problem hiding this comment.
FP4 quantization requires compute capability 10.0 or higher (sm100), not sm89. The fp4_compatible() function in modelopt checks for torch.cuda.get_device_capability(0) >= (10, 0). Using minimum_sm(89) will cause the test to run on devices that don't actually support FP4, leading to failures or fallback behavior.
| marks=minimum_sm(89), | |
| marks=minimum_sm(100), |
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
What does this PR do?
Type of change: new tests, Bug fix
Overview:
Fixed the INT8 config issue
Add HF checkpoint export test coverage
The
--hf-ckpt-direxport path had zero test coverage. This MR adds tests at two levels:Unit tests (tests/unit/torch/export/test_export_diffusers.py):
GPU integration tests (tests/gpu/torch/export/test_export_diffusers_hf_ckpt.py)
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
Release Notes
Tests
Chores