Skip to content

Updated the diffusion config issue and more test cases#937

Open
jingyu-ml wants to merge 9 commits intomainfrom
jingyux/jingyux-bug-fixed-5924267
Open

Updated the diffusion config issue and more test cases#937
jingyu-ml wants to merge 9 commits intomainfrom
jingyux/jingyux-bug-fixed-5924267

Conversation

@jingyu-ml
Copy link
Contributor

@jingyu-ml jingyu-ml commented Feb 26, 2026

What does this PR do?

Type of change: new tests, Bug fix

Overview:

  • Fixed the INT8 config issue

  • Add HF checkpoint export test coverage

  1. The --hf-ckpt-dir export path had zero test coverage. This MR adds tests at two levels:

  2. Unit tests (tests/unit/torch/export/test_export_diffusers.py):

    • Extended test_export_diffusers_real_quantized to parametrize over INT8, INT8 SmoothQuant, FP8, and FP4 configs
    • (previously only FP8). This gives 3 models x 4 configs = 12 test cases.
  3. GPU integration tests (tests/gpu/torch/export/test_export_diffusers_hf_ckpt.py)

    • New file testing the full quantize.py --hf-ckpt-dir pipeline via subprocess with 4 combos:
    • SDXL INT8 smoothquant min-mean (the exact scenario that triggered the bug)
    • Flux INT8 smoothquant min-mean
    • SDXL FP8
    • Flux FP4

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?:No
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?:No
  • Did you update Changelog?:No

Additional Information

Summary by CodeRabbit

Release Notes

  • Tests

    • Added test coverage for exporting Diffusers models with Hugging Face checkpoints across multiple quantization formats (INT8, FP8, FP4)
    • Extended quantization export testing to validate multiple configuration scenarios
  • Chores

    • Refined INT8 quantization configuration with improved calibrator support for convolution layers

@jingyu-ml jingyu-ml requested a review from a team as a code owner February 26, 2026 00:16
@jingyu-ml jingyu-ml self-assigned this Feb 26, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Configuration axis defaults updated from 0 to None for INT8 quantizers, with Conv2d input quantizers now receiving PercentileCalibrator settings. New test suite validates Hugging Face checkpoint export for Diffusers models across multiple quantization formats. Existing quantization unit tests extended with parameterized configuration coverage.

Changes

Cohort / File(s) Summary
Quantization Configuration
examples/diffusers/quantization/config.py
Modified INT8_DEFAULT_CONFIG quantization axes from 0 to None for weight and input quantizers. Updated reset_set_int8_config to add PercentileCalibrator configuration for Conv2d input quantizers while removing per-layer Linear handling logic.
HF Checkpoint Export Testing
tests/gpu/torch/export/test_export_diffusers_hf_ckpt.py
New comprehensive test module with DiffuserHfExportModel dataclass and quantize_and_export_hf method. Parameterized test_diffusers_hf_ckpt_export validates HF checkpoint directory structure, config.json presence, and weight file formats across multiple Diffusers models (sdxl-1.0, flux-schnell) with int8, fp8, and fp4 quantization settings.
Parameterized Quantization Testing
tests/unit/torch/export/test_export_diffusers.py
Extended test_export_diffusers_real_quantized with parameterization over multiple quantization configurations (config_id, quant_cfg). Added conditional FP4 block-size error handling with test skipping and minimum_sm gating for CI compatibility.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is vague and uses generic terms like 'more test cases' without clearly specifying the main change or the nature of the INT8 configuration fix. Make the title more specific and descriptive, e.g., 'Fix INT8 quantization config and add HF checkpoint export tests' to clearly convey both the bug fix and test additions.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch jingyux/jingyux-bug-fixed-5924267

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/torch/export/test_export_diffusers.py (1)

112-117: Reasonable skip for tiny model incompatibility, but consider using pytest.skip message consistency.

The exception handling gracefully handles the case where tiny test models have weights incompatible with FP4 block quantization. However, catching a broad AssertionError and checking for a string pattern is fragile.

Consider whether the underlying code could raise a more specific exception type (e.g., ValueError or a custom BlockSizeError) to make this check more robust. As-is, if the assertion message text changes, this skip logic will break silently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/export/test_export_diffusers.py` around lines 112 - 117, The
test currently catches a broad AssertionError from export_hf_checkpoint and
inspects the message text, which is fragile; update the code so
export_hf_checkpoint (or the helper that validates quantization block size)
raises a specific exception (e.g., ValueError or a new BlockSizeError) when tiny
weights are incompatible with FP4 block quantization, and then change the test
to catch that specific exception (check for BlockSizeError or ValueError) and
call pytest.skip with the same message for config_id == "fp4"; reference the
export_hf_checkpoint function and the validation that detects "block size"
incompatibility when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/torch/export/test_export_diffusers.py`:
- Around line 112-117: The test currently catches a broad AssertionError from
export_hf_checkpoint and inspects the message text, which is fragile; update the
code so export_hf_checkpoint (or the helper that validates quantization block
size) raises a specific exception (e.g., ValueError or a new BlockSizeError)
when tiny weights are incompatible with FP4 block quantization, and then change
the test to catch that specific exception (check for BlockSizeError or
ValueError) and call pytest.skip with the same message for config_id == "fp4";
reference the export_hf_checkpoint function and the validation that detects
"block size" incompatibility when making the change.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4eacb0d and aba42e7.

📒 Files selected for processing (3)
  • examples/diffusers/quantization/config.py
  • tests/gpu/torch/export/test_export_diffusers_hf_ckpt.py
  • tests/unit/torch/export/test_export_diffusers.py

Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
@codecov
Copy link

codecov bot commented Feb 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.15%. Comparing base (a538f2e) to head (699bdc9).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #937      +/-   ##
==========================================
- Coverage   72.16%   72.15%   -0.01%     
==========================================
  Files         210      210              
  Lines       23522    23522              
==========================================
- Hits        16974    16973       -1     
- Misses       6548     6549       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jingyu-ml and others added 2 commits February 26, 2026 05:33
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
assert "quantization_config" in config_data


@pytest.mark.skipif(not torch.cuda.is_available(), reason="FP4 export requires NVIDIA GPU")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a test needs GPU, it should be moved to tests/gpu/torch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go to tests/examples/diffusers



@pytest.mark.parametrize(
"model",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we cover Wan2.2 and LTX-2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wan2.2 and LTX-2 is too big for CI, lets see if we have other options

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes an INT8 quantization configuration issue in the diffusers quantization examples and adds comprehensive test coverage for the HuggingFace checkpoint export functionality. The changes address a bug in the INT8 config and ensure that the --hf-ckpt-dir export path, which previously had zero test coverage, is now properly tested across multiple quantization formats and models.

Changes:

  • Simplified INT8 quantization configuration by updating axis settings and streamlining the reset_set_int8_config function to only add PercentileCalibrator to Conv2d layers
  • Extended unit tests to cover INT8, INT8 SmoothQuant, FP8, and FP4 quantization formats across three diffusion model types (UNet, DiT, Flux)
  • Added GPU integration tests for end-to-end quantization and HF checkpoint export with SDXL and Flux models using INT8 SmoothQuant, FP8, and FP4 formats

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
examples/diffusers/quantization/config.py Updated INT8_DEFAULT_CONFIG axis settings and simplified reset_set_int8_config to only handle Conv2d calibrators
tests/unit/torch/export/test_export_diffusers.py Extended parametrization to test multiple quantization configs (INT8, INT8 SmoothQuant, FP8) and added separate FP4 test
tests/gpu/torch/export/test_export_diffusers_hf_ckpt.py New GPU integration test file with 4 test scenarios covering INT8 SmoothQuant, FP8, and FP4 with SDXL and Flux models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"quant_cfg": {
"*weight_quantizer": {"num_bits": 8, "axis": 0},
"*input_quantizer": {"num_bits": 8, "axis": 0},
"*weight_quantizer": {"num_bits": 8, "axis": None},
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weight_quantizer axis should be 0 (per-channel quantization), not None (per-tensor quantization), to match the standard modelopt INT8_DEFAULT_CFG configuration. The modelopt INT8_DEFAULT_CFG uses axis=0 for weight quantization and axis=None for input quantization. This inconsistency could lead to suboptimal quantization quality for weights.

Suggested change
"*weight_quantizer": {"num_bits": 8, "axis": None},
"*weight_quantizer": {"num_bits": 8, "axis": 0},

Copilot uses AI. Check for mistakes.
collect_method="default",
model_dtype="BFloat16",
),
marks=minimum_sm(89),
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP4 quantization requires compute capability 10.0 or higher (sm100), not sm89. The fp4_compatible() function in modelopt checks for torch.cuda.get_device_capability(0) >= (10, 0). Using minimum_sm(89) will cause the test to run on devices that don't actually support FP4, leading to failures or fallback behavior.

Suggested change
marks=minimum_sm(89),
marks=minimum_sm(100),

Copilot uses AI. Check for mistakes.
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants