Skip to content

Conversation

@vlaindic
Copy link
Member

Motivation

To ease future maintenance, I'm proposing this PR for simplifying 2nd level trap handler. This PR aims to ease future integration of new architectures.

Technical Details

JIRA ID

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings January 26, 2026 13:42
@vlaindic vlaindic requested a review from cfreeamd as a code owner January 26, 2026 13:42
@vlaindic vlaindic requested a review from jlgreathouse January 26, 2026 13:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR simplifies the second-level trap handler for PC (Program Counter) sampling on GFX12 architecture. The refactoring consolidates common code paths between host-trap and stochastic sampling modes, improves code organization, and adds clearer documentation throughout the assembly code.

Changes:

  • Merged common sample buffer filling logic between host-trap and stochastic sampling paths
  • Reorganized register usage and buffer management to reduce code duplication
  • Enhanced inline documentation with detailed register state tracking and clearer comments

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// Check if we should signal the host (only trap handler instances that observes ttmp4 == tmp5 signals host)
s_cmp_lg_u32 ttmp4, ttmp5 // Compare buff_written_val with watermark (fails if ttmp4 == ttmp5)

// FIMXE: If we keep this piece of code, is it possible that multiple signals are sent for the same buffer???
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'FIMXE' to 'FIXME'.

Suggested change
// FIMXE: If we keep this piece of code, is it possible that multiple signals are sent for the same buffer???
// FIXME: If we keep this piece of code, is it possible that multiple signals are sent for the same buffer???

Copilot uses AI. Check for mistakes.
Comment on lines +915 to +916
// TODO: check whether we need additional checking if either Host-Trap or Stochastic is set
// (probably no, as we don't support both at the same time)
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO comment suggests uncertainty about clearing exception flags when both Host-Trap and Stochastic modes might be set. This should be clarified with explicit validation or removed if unnecessary.

Suggested change
// TODO: check whether we need additional checking if either Host-Trap or Stochastic is set
// (probably no, as we don't support both at the same time)
// Host-Trap and Stochastic (perf-snapshot) modes are configured as mutually exclusive,
// so this path can safely acknowledge the event by clearing both relevant exception flags.

Copilot uses AI. Check for mistakes.
@vlaindic vlaindic requested review from Copilot and removed request for shwetagkhatri January 26, 2026 13:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.set SQ_WAVE_EXCP_FLAG_PRIV_HT_SHIFT , 7
.set SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_SHIFT , 8
.set SQ_WAVE_EXCP_FLAG_PRIV_WAVE_END_SHIFT , 9
.set SQ_WAVE_EXCP_FLAG_PRIV_PERF_SNAPSHOT_SHIFT , 10
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent naming: this constant is renamed from SQ_WAVE_EXCP_FLAG_PRIV_PERF_SNAPSHOT to SQ_WAVE_EXCP_FLAG_PRIV_PERF_SNAPSHOT_SHIFT to include the _SHIFT suffix, but line 55 introduces SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT without a _SHIFT suffix despite being at bit position 5. Consider adding _SHIFT suffix for consistency or clarifying why this constant differs.

Copilot uses AI. Check for mistakes.
Comment on lines +623 to +624
v_and_b32 v3, ttmp13, 0x0f000000 // Mask to extract WAVE_ID[27:24]
v_lshl_or_b32 v2, v3, 4, v2 // Shift WAVE_ID to bits [4:0]
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states 'Shift WAVE_ID to bits [4:0]' but the mask extracts bits [27:24] from HW_ID2 (VM_ID according to line 610), not WAVE_ID. Based on the HW_ID layout documented in lines 599-610, bits [27:24] of HW_ID2 should be VM_ID, and the shift left by 4 would place it in bits [31:28] of the result, not [4:0]. Either the comment or the code logic appears incorrect.

Suggested change
v_and_b32 v3, ttmp13, 0x0f000000 // Mask to extract WAVE_ID[27:24]
v_lshl_or_b32 v2, v3, 4, v2 // Shift WAVE_ID to bits [4:0]
v_and_b32 v3, ttmp13, 0x0f000000 // Mask to extract VM_ID[27:24]
v_lshl_or_b32 v2, v3, 4, v2 // Shift VM_ID to bits [31:28]

Copilot uses AI. Check for mistakes.

v_mov_b32 v0, 0 // want to atomic increment
v_mov_b32 v1, 1 // buf_written_valX

Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line contains only a space character. Remove the trailing whitespace to follow assembly code style conventions.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines +706 to +708
// Clear out 7 MSBs of PC_HI (used as scratch bits in ttmp1)
v_writelane_b32 v3, ttmp1, 0 // v[3] = PC_HI
v_and_b32 v3, v3, SQ_WAVE_PC_HI_ADDRESS_MASK // clear out scratch bits
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 706 states 'Clear out 7 MSBs of PC_HI' but this is inconsistent with the HW_ID layout. According to line 80, SQ_WAVE_PC_HI_TRAP_ID_MASK occupies bits [31:28] (4 MSBs for trap ID). The comment should either specify '4 MSBs' or clarify what the additional 3 bits represent. Additionally, verify that SQ_WAVE_PC_HI_ADDRESS_MASK correctly masks only the address portion.

Suggested change
// Clear out 7 MSBs of PC_HI (used as scratch bits in ttmp1)
v_writelane_b32 v3, ttmp1, 0 // v[3] = PC_HI
v_and_b32 v3, v3, SQ_WAVE_PC_HI_ADDRESS_MASK // clear out scratch bits
// Clear out trap ID bits (upper 4 bits [31:28]) of PC_HI using the address mask
v_writelane_b32 v3, ttmp1, 0 // v[3] = PC_HI
v_and_b32 v3, v3, SQ_WAVE_PC_HI_ADDRESS_MASK // clear out trap ID, keep address bits

Copilot uses AI. Check for mistakes.
@vlaindic vlaindic marked this pull request as draft January 26, 2026 13:48
@vlaindic vlaindic requested a review from Copilot January 26, 2026 14:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


.set SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK , (1 << 4) - 1
.set SQ_WAVE_EXCP_FLAG_PRIV_MEMVIOL_SHIFT , 4
.set SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT , 5
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT is defined but never used in this file. If it's not used, it should be removed to avoid confusion. If it's intended for future use, add a comment explaining its purpose.

Suggested change
.set SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT , 5
.set SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT , 5 // Reserved/defined for potential future use in context-save related traps

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be: SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT ?

.set SQ_WAVE_PC_HI_TRAP_ID_BFE , (SQ_WAVE_PC_HI_TRAP_ID_SHIFT | (SQ_WAVE_PC_HI_TRAP_ID_SIZE << 16))
.set SQ_WAVE_PC_HI_TRAP_ID_SHIFT , 28
.set SQ_WAVE_PC_HI_TRAP_ID_SIZE , 4
.set SQ_WAVE_PC_HI_TRAP_ID_MASK , (((1 << SQ_WAVE_PC_HI_TRAP_ID_SIZE) - 1) << SQ_WAVE_PC_HI_TRAP_ID_SHIFT)
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant SQ_WAVE_PC_HI_TRAP_ID_MASK is defined but never used in this file. If it's not used, it should be removed to avoid confusion. If it's intended for future use, add a comment explaining its purpose.

Suggested change
.set SQ_WAVE_PC_HI_TRAP_ID_MASK , (((1 << SQ_WAVE_PC_HI_TRAP_ID_SIZE) - 1) << SQ_WAVE_PC_HI_TRAP_ID_SHIFT)

Copilot uses AI. Check for mistakes.

s_wait_storecnt 0 // wait for signal value 0 to be written to amd_signal_t.value

v_writelane_b32 v0, ttmp13, 0x0 // v[0] = 0 (event ID low part)
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "v[0] = 0 (event ID low part)" but the instruction is writing ttmp13 (which contains the event_id loaded on line 844) to v0, not the value 0. The comment should be corrected to indicate that v[0] receives the event_id value.

Suggested change
v_writelane_b32 v0, ttmp13, 0x0 // v[0] = 0 (event ID low part)
v_writelane_b32 v0, ttmp13, 0x0 // v[0] = event_id (low part)

Copilot uses AI. Check for mistakes.
s_sendmsg sendmsg(MSG_INTERRUPT) // send interrupt message
s_wait_kmcnt 0
s_mov_b32 m0, ttmp14 // restore m0
s_mov_b32 ttmp14, m0 // Backup m0 (event ID low part) to ttmp14
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "Backup m0 (event ID low part)" but at this point m0 doesn't contain the event ID yet - it still contains whatever was in m0 before. The event ID is moved into m0 on line 865. The comment should be corrected to say "Backup m0 to ttmp14" without the misleading "(event ID low part)" part.

Suggested change
s_mov_b32 ttmp14, m0 // Backup m0 (event ID low part) to ttmp14
s_mov_b32 ttmp14, m0 // Backup m0 to ttmp14

Copilot uses AI. Check for mistakes.
Comment on lines +863 to +868
s_mov_b32 ttmp14, m0 // Backup m0 (event ID low part) to ttmp14
v_readlane_b32 ttmp15, v0, 0 // Read event ID low part from v0 into ttmp15
s_mov_b32 m0, ttmp15 // Set m0 to event ID (low part)
s_sendmsg sendmsg(MSG_INTERRUPT) // send interrupt message to host
s_wait_kmcnt 0 // Wait for interrupt to complete
s_mov_b32 m0, ttmp14 // Restore m0 to original value (event ID low part)
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "Restore m0 to original value (event ID low part)" but ttmp14 contains the original m0 value (saved on line 863), not the event ID. The comment should be corrected to say "Restore m0 to original value" without the misleading "(event ID low part)" part.

Suggested change
s_mov_b32 ttmp14, m0 // Backup m0 (event ID low part) to ttmp14
v_readlane_b32 ttmp15, v0, 0 // Read event ID low part from v0 into ttmp15
s_mov_b32 m0, ttmp15 // Set m0 to event ID (low part)
s_sendmsg sendmsg(MSG_INTERRUPT) // send interrupt message to host
s_wait_kmcnt 0 // Wait for interrupt to complete
s_mov_b32 m0, ttmp14 // Restore m0 to original value (event ID low part)
s_mov_b32 ttmp14, m0 // Backup m0 to ttmp14
v_readlane_b32 ttmp15, v0, 0 // Read event ID low part from v0 into ttmp15
s_mov_b32 m0, ttmp15 // Set m0 to event ID (low part)
s_sendmsg sendmsg(MSG_INTERRUPT) // send interrupt message to host
s_wait_kmcnt 0 // Wait for interrupt to complete
s_mov_b32 m0, ttmp14 // Restore m0 to original value

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@dayatsin-amd dayatsin-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit picks


.set SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK , (1 << 4) - 1
.set SQ_WAVE_EXCP_FLAG_PRIV_MEMVIOL_SHIFT , 4
.set SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT , 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be: SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT ?

.set SQ_WAVE_TRAP_CTRL_MATH_EXCP_MASK , ((1 << 7) - 1)
.set SQ_WAVE_TRAP_CTRL_ADDR_WATCH_SHIFT , 7
.set SQ_WAVE_TRAP_CTRL_WAVE_END_SHIFT , 8
.set SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST , 9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already there before your patch, but should this be SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST_SHIFT ?

@vlaindic vlaindic mentioned this pull request Jan 27, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants