Skip to content

Conversation

@bgopesh
Copy link
Contributor

@bgopesh bgopesh commented Jan 27, 2026

Motivation

https://ontrack-internal.amd.com/browse/SWDEV-576770

Technical Details

JIRA ID

Test Plan

Test Result

Submission Checklist

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the handling of F16 (half-precision float) instructions in AQL profile integration tests for RDNA3 architectures. RDNA3 GPUs (gfx11xx and gfx12xx) default to Real16 mode, which requires the .set fake16 directive to use legacy F16 instructions.

Changes:

  • Added preprocessor macros to detect RDNA3 architectures (gfx11xx and gfx12xx variants)
  • Updated inline assembly blocks using F16 instructions to include .set fake16 directive for RDNA3 architectures
  • Applied fixes to both iops_kernel_trans and iops_kernel1 functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +40 to +41
// Helper macro to detect RDNA3 (gfx11xx) architectures
// These architectures default to Real16 mode and require .set fake16 for legacy F16 instructions
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded list of architecture defines makes this difficult to maintain as new RDNA3 variants are added. Consider checking for a common gfx11 or gfx12 family define if available, or documenting the rationale for this specific list in a comment.

Suggested change
// Helper macro to detect RDNA3 (gfx11xx) architectures
// These architectures default to Real16 mode and require .set fake16 for legacy F16 instructions
// Helper macro to detect RDNA3 (gfx11xx) architectures.
// These architectures default to Real16 mode and require .set fake16 for legacy F16 instructions.
//
// NOTE: This list is intentionally explicit rather than using a hypothetical common
// "gfx11/gfx12 family" preprocessor macro. Current HIP/Clang toolchains used by this
// test do not provide a stable family-wide define that covers all RDNA3 parts with
// Real16 defaults, so we enumerate the known SKUs individually to avoid mis-detection.
// When adding support for new RDNA3/gfx11/gfx12 variants that behave like these
// architectures, extend the list below with the corresponding __gfxXXXX__ defines.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants