Updated the diffusion config issue and more test cases by jingyu-ml · Pull Request #937 · NVIDIA/Model-Optimizer

jingyu-ml · 2026-02-26T00:16:58Z

What does this PR do?

Type of change: new tests, Bug fix

Overview:

Fixed the INT8 config issue
Add HF checkpoint export test coverage

The --hf-ckpt-dir export path had zero test coverage. This MR adds tests at two levels:
Unit tests (tests/unit/torch/export/test_export_diffusers.py):
- Extended test_export_diffusers_real_quantized to parametrize over INT8, INT8 SmoothQuant, FP8, and FP4 configs
- (previously only FP8). This gives 3 models x 4 configs = 12 test cases.
GPU integration tests (tests/gpu/torch/export/test_export_diffusers_hf_ckpt.py)
- New file testing the full quantize.py --hf-ckpt-dir pipeline via subprocess with 4 combos:
- SDXL INT8 smoothquant min-mean (the exact scenario that triggered the bug)
- Flux INT8 smoothquant min-mean
- SDXL FP8
- Flux FP4

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?:No
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?:No
Did you update Changelog?:No

Additional Information

Summary by CodeRabbit

Release Notes

Tests
- Added test coverage for exporting Diffusers models with Hugging Face checkpoints across multiple quantization formats (INT8, FP8, FP4)
- Extended quantization export testing to validate multiple configuration scenarios
Chores
- Refined INT8 quantization configuration with improved calibrator support for convolution layers

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

coderabbitai · 2026-02-26T00:17:19Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Configuration axis defaults updated from 0 to None for INT8 quantizers, with Conv2d input quantizers now receiving PercentileCalibrator settings. New test suite validates Hugging Face checkpoint export for Diffusers models across multiple quantization formats. Existing quantization unit tests extended with parameterized configuration coverage.

Changes

Cohort / File(s)	Summary
Quantization Configuration `examples/diffusers/quantization/config.py`	Modified INT8_DEFAULT_CONFIG quantization axes from 0 to None for weight and input quantizers. Updated reset_set_int8_config to add PercentileCalibrator configuration for Conv2d input quantizers while removing per-layer Linear handling logic.
HF Checkpoint Export Testing `tests/gpu/torch/export/test_export_diffusers_hf_ckpt.py`	New comprehensive test module with DiffuserHfExportModel dataclass and quantize_and_export_hf method. Parameterized test_diffusers_hf_ckpt_export validates HF checkpoint directory structure, config.json presence, and weight file formats across multiple Diffusers models (sdxl-1.0, flux-schnell) with int8, fp8, and fp4 quantization settings.
Parameterized Quantization Testing `tests/unit/torch/export/test_export_diffusers.py`	Extended test_export_diffusers_real_quantized with parameterization over multiple quantization configurations (config_id, quant_cfg). Added conditional FP4 block-size error handling with test skipping and minimum_sm gating for CI compatibility.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is vague and uses generic terms like 'more test cases' without clearly specifying the main change or the nature of the INT8 configuration fix.	Make the title more specific and descriptive, e.g., 'Fix INT8 quantization config and add HF checkpoint export tests' to clearly convey both the bug fix and test additions.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jingyux/jingyux-bug-fixed-5924267

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/unit/torch/export/test_export_diffusers.py (1)
112-117: Reasonable skip for tiny model incompatibility, but consider using pytest.skip message consistency.

The exception handling gracefully handles the case where tiny test models have weights incompatible with FP4 block quantization. However, catching a broad AssertionError and checking for a string pattern is fragile.

Consider whether the underlying code could raise a more specific exception type (e.g., ValueError or a custom BlockSizeError) to make this check more robust. As-is, if the assertion message text changes, this skip logic will break silently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/export/test_export_diffusers.py` around lines 112 - 117, The
test currently catches a broad AssertionError from export_hf_checkpoint and
inspects the message text, which is fragile; update the code so
export_hf_checkpoint (or the helper that validates quantization block size)
raises a specific exception (e.g., ValueError or a new BlockSizeError) when tiny
weights are incompatible with FP4 block quantization, and then change the test
to catch that specific exception (check for BlockSizeError or ValueError) and
call pytest.skip with the same message for config_id == "fp4"; reference the
export_hf_checkpoint function and the validation that detects "block size"
incompatibility when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/torch/export/test_export_diffusers.py`:
- Around line 112-117: The test currently catches a broad AssertionError from
export_hf_checkpoint and inspects the message text, which is fragile; update the
code so export_hf_checkpoint (or the helper that validates quantization block
size) raises a specific exception (e.g., ValueError or a new BlockSizeError)
when tiny weights are incompatible with FP4 block quantization, and then change
the test to catch that specific exception (check for BlockSizeError or
ValueError) and call pytest.skip with the same message for config_id == "fp4";
reference the export_hf_checkpoint function and the validation that detects
"block size" incompatibility when making the change.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4eacb0d and aba42e7.

📒 Files selected for processing (3)

examples/diffusers/quantization/config.py
tests/gpu/torch/export/test_export_diffusers_hf_ckpt.py
tests/unit/torch/export/test_export_diffusers.py

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

codecov · 2026-02-26T01:08:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.15%. Comparing base (a538f2e) to head (699bdc9).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #937      +/-   ##
==========================================
- Coverage   72.16%   72.15%   -0.01%     
==========================================
  Files         210      210              
  Lines       23522    23522              
==========================================
- Hits        16974    16973       -1     
- Misses       6548     6549       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

kevalmorabia97 · 2026-02-26T09:26:15Z

tests/unit/torch/export/test_export_diffusers.py

+    assert "quantization_config" in config_data
+
+
+@pytest.mark.skipif(not torch.cuda.is_available(), reason="FP4 export requires NVIDIA GPU")


If a test needs GPU, it should be moved to tests/gpu/torch

kevalmorabia97 · 2026-02-26T09:26:39Z

tests/examples/diffusers/test_export_diffusers_hf_ckpt.py

This should go to tests/examples/diffusers

examples/diffusers/quantization/config.py

Edwardf0t1 · 2026-02-27T07:37:34Z

tests/examples/diffusers/test_export_diffusers_hf_ckpt.py

+
+
+@pytest.mark.parametrize(
+    "model",


should we cover Wan2.2 and LTX-2?

Wan2.2 and LTX-2 is too big for CI, lets see if we have other options

Copilot

Pull request overview

This pull request fixes an INT8 quantization configuration issue in the diffusers quantization examples and adds comprehensive test coverage for the HuggingFace checkpoint export functionality. The changes address a bug in the INT8 config and ensure that the --hf-ckpt-dir export path, which previously had zero test coverage, is now properly tested across multiple quantization formats and models.

Changes:

Simplified INT8 quantization configuration by updating axis settings and streamlining the reset_set_int8_config function to only add PercentileCalibrator to Conv2d layers
Extended unit tests to cover INT8, INT8 SmoothQuant, FP8, and FP4 quantization formats across three diffusion model types (UNet, DiT, Flux)
Added GPU integration tests for end-to-end quantization and HF checkpoint export with SDXL and Flux models using INT8 SmoothQuant, FP8, and FP4 formats

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
examples/diffusers/quantization/config.py	Updated INT8_DEFAULT_CONFIG axis settings and simplified reset_set_int8_config to only handle Conv2d calibrators
tests/unit/torch/export/test_export_diffusers.py	Extended parametrization to test multiple quantization configs (INT8, INT8 SmoothQuant, FP8) and added separate FP4 test
tests/gpu/torch/export/test_export_diffusers_hf_ckpt.py	New GPU integration test file with 4 test scenarios covering INT8 SmoothQuant, FP8, and FP4 with SDXL and Flux models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T07:42:45Z

examples/diffusers/quantization/config.py

    "quant_cfg": {
-        "*weight_quantizer": {"num_bits": 8, "axis": 0},
-        "*input_quantizer": {"num_bits": 8, "axis": 0},
+        "*weight_quantizer": {"num_bits": 8, "axis": None},


The weight_quantizer axis should be 0 (per-channel quantization), not None (per-tensor quantization), to match the standard modelopt INT8_DEFAULT_CFG configuration. The modelopt INT8_DEFAULT_CFG uses axis=0 for weight quantization and axis=None for input quantization. This inconsistency could lead to suboptimal quantization quality for weights.

Suggested change

"*weight_quantizer": {"num_bits": 8, "axis": None},

"*weight_quantizer": {"num_bits": 8, "axis": 0},

Copilot · 2026-02-27T07:42:45Z

tests/examples/diffusers/test_export_diffusers_hf_ckpt.py

+                collect_method="default",
+                model_dtype="BFloat16",
+            ),
+            marks=minimum_sm(89),


FP4 quantization requires compute capability 10.0 or higher (sm100), not sm89. The fp4_compatible() function in modelopt checks for torch.cuda.get_device_capability(0) >= (10, 0). Using minimum_sm(89) will cause the test to run on devices that don't actually support FP4, leading to failures or fallback behavior.

Suggested change

marks=minimum_sm(89),

marks=minimum_sm(100),

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml added 2 commits February 26, 2026 00:11

Update the fix and add more test cases

4ba3ac1

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/jingyux-bug-fixed-5924267

aba42e7

jingyu-ml requested a review from a team as a code owner February 26, 2026 00:16

jingyu-ml requested a review from yeyu-nvidia February 26, 2026 00:16

jingyu-ml self-assigned this Feb 26, 2026

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

jingyu-ml added 2 commits February 26, 2026 00:25

Remove the SM checks

4a07d88

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Fixed the CI issue

36f984b

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml and others added 2 commits February 26, 2026 05:33

Update

71ea192

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Merge branch 'main' into jingyux/jingyux-bug-fixed-5924267

17d5dcf

kevalmorabia97 requested changes Feb 26, 2026

View reviewed changes

kevalmorabia97 requested review from Edwardf0t1 and removed request for yeyu-nvidia February 26, 2026 09:27

Edwardf0t1 requested a review from Copilot February 27, 2026 07:36

Copilot started reviewing on behalf of Edwardf0t1 February 27, 2026 07:37 View session

Edwardf0t1 reviewed Feb 27, 2026

View reviewed changes

Copilot AI reviewed Feb 27, 2026

View reviewed changes

jingyu-ml added 3 commits February 27, 2026 22:58

Merge branch 'main' into jingyux/jingyux-bug-fixed-5924267

724d5a2

Update

9a692e0

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

Update the config

699bdc9

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>

jingyu-ml requested review from Edwardf0t1 and kevalmorabia97 February 28, 2026 00:40

		assert "quantization_config" in config_data


		@pytest.mark.skipif(not torch.cuda.is_available(), reason="FP4 export requires NVIDIA GPU")

	"*weight_quantizer": {"num_bits": 8, "axis": None},
	"*weight_quantizer": {"num_bits": 8, "axis": 0},



		@pytest.mark.parametrize(
		"model",

Conversation

jingyu-ml commented Feb 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevalmorabia97 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Edwardf0t1 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

jingyu-ml Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jingyu-ml commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

codecov bot commented Feb 26, 2026 •

edited

Loading