-
Notifications
You must be signed in to change notification settings - Fork 119
Simplifying PC sampling part of the 2nd lvl TH #2857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR simplifies the second-level trap handler for PC (Program Counter) sampling on GFX12 architecture. The refactoring consolidates common code paths between host-trap and stochastic sampling modes, improves code organization, and adds clearer documentation throughout the assembly code.
Changes:
- Merged common sample buffer filling logic between host-trap and stochastic sampling paths
- Reorganized register usage and buffer management to reduce code duplication
- Enhanced inline documentation with detailed register state tracking and clearer comments
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
projects/rocr-runtime/runtime/hsa-runtime/core/runtime/trap_handler/trap_handler_gfx12.s
Outdated
Show resolved
Hide resolved
| // Check if we should signal the host (only trap handler instances that observes ttmp4 == tmp5 signals host) | ||
| s_cmp_lg_u32 ttmp4, ttmp5 // Compare buff_written_val with watermark (fails if ttmp4 == ttmp5) | ||
|
|
||
| // FIMXE: If we keep this piece of code, is it possible that multiple signals are sent for the same buffer??? |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'FIMXE' to 'FIXME'.
| // FIMXE: If we keep this piece of code, is it possible that multiple signals are sent for the same buffer??? | |
| // FIXME: If we keep this piece of code, is it possible that multiple signals are sent for the same buffer??? |
projects/rocr-runtime/runtime/hsa-runtime/core/runtime/trap_handler/trap_handler_gfx12.s
Outdated
Show resolved
Hide resolved
| // TODO: check whether we need additional checking if either Host-Trap or Stochastic is set | ||
| // (probably no, as we don't support both at the same time) |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO comment suggests uncertainty about clearing exception flags when both Host-Trap and Stochastic modes might be set. This should be clarified with explicit validation or removed if unnecessary.
| // TODO: check whether we need additional checking if either Host-Trap or Stochastic is set | |
| // (probably no, as we don't support both at the same time) | |
| // Host-Trap and Stochastic (perf-snapshot) modes are configured as mutually exclusive, | |
| // so this path can safely acknowledge the event by clearing both relevant exception flags. |
projects/rocr-runtime/runtime/hsa-runtime/core/runtime/trap_handler/trap_handler_gfx12.s
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .set SQ_WAVE_EXCP_FLAG_PRIV_HT_SHIFT , 7 | ||
| .set SQ_WAVE_EXCP_FLAG_PRIV_WAVE_START_SHIFT , 8 | ||
| .set SQ_WAVE_EXCP_FLAG_PRIV_WAVE_END_SHIFT , 9 | ||
| .set SQ_WAVE_EXCP_FLAG_PRIV_PERF_SNAPSHOT_SHIFT , 10 |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent naming: this constant is renamed from SQ_WAVE_EXCP_FLAG_PRIV_PERF_SNAPSHOT to SQ_WAVE_EXCP_FLAG_PRIV_PERF_SNAPSHOT_SHIFT to include the _SHIFT suffix, but line 55 introduces SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT without a _SHIFT suffix despite being at bit position 5. Consider adding _SHIFT suffix for consistency or clarifying why this constant differs.
| v_and_b32 v3, ttmp13, 0x0f000000 // Mask to extract WAVE_ID[27:24] | ||
| v_lshl_or_b32 v2, v3, 4, v2 // Shift WAVE_ID to bits [4:0] |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states 'Shift WAVE_ID to bits [4:0]' but the mask extracts bits [27:24] from HW_ID2 (VM_ID according to line 610), not WAVE_ID. Based on the HW_ID layout documented in lines 599-610, bits [27:24] of HW_ID2 should be VM_ID, and the shift left by 4 would place it in bits [31:28] of the result, not [4:0]. Either the comment or the code logic appears incorrect.
| v_and_b32 v3, ttmp13, 0x0f000000 // Mask to extract WAVE_ID[27:24] | |
| v_lshl_or_b32 v2, v3, 4, v2 // Shift WAVE_ID to bits [4:0] | |
| v_and_b32 v3, ttmp13, 0x0f000000 // Mask to extract VM_ID[27:24] | |
| v_lshl_or_b32 v2, v3, 4, v2 // Shift VM_ID to bits [31:28] |
|
|
||
| v_mov_b32 v0, 0 // want to atomic increment | ||
| v_mov_b32 v1, 1 // buf_written_valX | ||
|
|
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty line contains only a space character. Remove the trailing whitespace to follow assembly code style conventions.
| // Clear out 7 MSBs of PC_HI (used as scratch bits in ttmp1) | ||
| v_writelane_b32 v3, ttmp1, 0 // v[3] = PC_HI | ||
| v_and_b32 v3, v3, SQ_WAVE_PC_HI_ADDRESS_MASK // clear out scratch bits |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 706 states 'Clear out 7 MSBs of PC_HI' but this is inconsistent with the HW_ID layout. According to line 80, SQ_WAVE_PC_HI_TRAP_ID_MASK occupies bits [31:28] (4 MSBs for trap ID). The comment should either specify '4 MSBs' or clarify what the additional 3 bits represent. Additionally, verify that SQ_WAVE_PC_HI_ADDRESS_MASK correctly masks only the address portion.
| // Clear out 7 MSBs of PC_HI (used as scratch bits in ttmp1) | |
| v_writelane_b32 v3, ttmp1, 0 // v[3] = PC_HI | |
| v_and_b32 v3, v3, SQ_WAVE_PC_HI_ADDRESS_MASK // clear out scratch bits | |
| // Clear out trap ID bits (upper 4 bits [31:28]) of PC_HI using the address mask | |
| v_writelane_b32 v3, ttmp1, 0 // v[3] = PC_HI | |
| v_and_b32 v3, v3, SQ_WAVE_PC_HI_ADDRESS_MASK // clear out trap ID, keep address bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| .set SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK , (1 << 4) - 1 | ||
| .set SQ_WAVE_EXCP_FLAG_PRIV_MEMVIOL_SHIFT , 4 | ||
| .set SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT , 5 |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constant SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT is defined but never used in this file. If it's not used, it should be removed to avoid confusion. If it's intended for future use, add a comment explaining its purpose.
| .set SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT , 5 | |
| .set SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT , 5 // Reserved/defined for potential future use in context-save related traps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be: SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT ?
| .set SQ_WAVE_PC_HI_TRAP_ID_BFE , (SQ_WAVE_PC_HI_TRAP_ID_SHIFT | (SQ_WAVE_PC_HI_TRAP_ID_SIZE << 16)) | ||
| .set SQ_WAVE_PC_HI_TRAP_ID_SHIFT , 28 | ||
| .set SQ_WAVE_PC_HI_TRAP_ID_SIZE , 4 | ||
| .set SQ_WAVE_PC_HI_TRAP_ID_MASK , (((1 << SQ_WAVE_PC_HI_TRAP_ID_SIZE) - 1) << SQ_WAVE_PC_HI_TRAP_ID_SHIFT) |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constant SQ_WAVE_PC_HI_TRAP_ID_MASK is defined but never used in this file. If it's not used, it should be removed to avoid confusion. If it's intended for future use, add a comment explaining its purpose.
| .set SQ_WAVE_PC_HI_TRAP_ID_MASK , (((1 << SQ_WAVE_PC_HI_TRAP_ID_SIZE) - 1) << SQ_WAVE_PC_HI_TRAP_ID_SHIFT) |
|
|
||
| s_wait_storecnt 0 // wait for signal value 0 to be written to amd_signal_t.value | ||
|
|
||
| v_writelane_b32 v0, ttmp13, 0x0 // v[0] = 0 (event ID low part) |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says "v[0] = 0 (event ID low part)" but the instruction is writing ttmp13 (which contains the event_id loaded on line 844) to v0, not the value 0. The comment should be corrected to indicate that v[0] receives the event_id value.
| v_writelane_b32 v0, ttmp13, 0x0 // v[0] = 0 (event ID low part) | |
| v_writelane_b32 v0, ttmp13, 0x0 // v[0] = event_id (low part) |
| s_sendmsg sendmsg(MSG_INTERRUPT) // send interrupt message | ||
| s_wait_kmcnt 0 | ||
| s_mov_b32 m0, ttmp14 // restore m0 | ||
| s_mov_b32 ttmp14, m0 // Backup m0 (event ID low part) to ttmp14 |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says "Backup m0 (event ID low part)" but at this point m0 doesn't contain the event ID yet - it still contains whatever was in m0 before. The event ID is moved into m0 on line 865. The comment should be corrected to say "Backup m0 to ttmp14" without the misleading "(event ID low part)" part.
| s_mov_b32 ttmp14, m0 // Backup m0 (event ID low part) to ttmp14 | |
| s_mov_b32 ttmp14, m0 // Backup m0 to ttmp14 |
| s_mov_b32 ttmp14, m0 // Backup m0 (event ID low part) to ttmp14 | ||
| v_readlane_b32 ttmp15, v0, 0 // Read event ID low part from v0 into ttmp15 | ||
| s_mov_b32 m0, ttmp15 // Set m0 to event ID (low part) | ||
| s_sendmsg sendmsg(MSG_INTERRUPT) // send interrupt message to host | ||
| s_wait_kmcnt 0 // Wait for interrupt to complete | ||
| s_mov_b32 m0, ttmp14 // Restore m0 to original value (event ID low part) |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says "Restore m0 to original value (event ID low part)" but ttmp14 contains the original m0 value (saved on line 863), not the event ID. The comment should be corrected to say "Restore m0 to original value" without the misleading "(event ID low part)" part.
| s_mov_b32 ttmp14, m0 // Backup m0 (event ID low part) to ttmp14 | |
| v_readlane_b32 ttmp15, v0, 0 // Read event ID low part from v0 into ttmp15 | |
| s_mov_b32 m0, ttmp15 // Set m0 to event ID (low part) | |
| s_sendmsg sendmsg(MSG_INTERRUPT) // send interrupt message to host | |
| s_wait_kmcnt 0 // Wait for interrupt to complete | |
| s_mov_b32 m0, ttmp14 // Restore m0 to original value (event ID low part) | |
| s_mov_b32 ttmp14, m0 // Backup m0 to ttmp14 | |
| v_readlane_b32 ttmp15, v0, 0 // Read event ID low part from v0 into ttmp15 | |
| s_mov_b32 m0, ttmp15 // Set m0 to event ID (low part) | |
| s_sendmsg sendmsg(MSG_INTERRUPT) // send interrupt message to host | |
| s_wait_kmcnt 0 // Wait for interrupt to complete | |
| s_mov_b32 m0, ttmp14 // Restore m0 to original value |
dayatsin-amd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit picks
|
|
||
| .set SQ_WAVE_EXCP_FLAG_PRIV_ADDR_WATCH_MASK , (1 << 4) - 1 | ||
| .set SQ_WAVE_EXCP_FLAG_PRIV_MEMVIOL_SHIFT , 4 | ||
| .set SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT , 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be: SQ_WAVE_EXCP_FLAG_PRIV_SAVE_CONTEXT_SHIFT ?
| .set SQ_WAVE_TRAP_CTRL_MATH_EXCP_MASK , ((1 << 7) - 1) | ||
| .set SQ_WAVE_TRAP_CTRL_ADDR_WATCH_SHIFT , 7 | ||
| .set SQ_WAVE_TRAP_CTRL_WAVE_END_SHIFT , 8 | ||
| .set SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST , 9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was already there before your patch, but should this be SQ_WAVE_TRAP_CTRL_TRAP_AFTER_INST_SHIFT ?
Motivation
To ease future maintenance, I'm proposing this PR for simplifying 2nd level trap handler. This PR aims to ease future integration of new architectures.
Technical Details
JIRA ID
Test Plan
Test Result
Submission Checklist