[rocprofiler-compute] [Documentation] Add metric descriptions for missing gfx950 metrics #2874

vedithal-amd · 2026-01-27T00:55:49Z

Motivation

This PR adds metric descriptions for 48 gfx950 metrics that were previously missing descriptions across the following hardware blocks:

Command Processor (CPC): 3 metrics
Workgroup Manager (SPI): 3 metrics
Instruction Mix: 2 metrics
Compute Pipeline: 3 metrics
Local Data Share (LDS): 8 metrics
TA/TD Units: 3 metrics
vL1D Cache: 18 metrics
L2 Cache: 9 metrics

These descriptions provide essential context for users profiling and analyzing performance on gfx950 architecture, including details on hardware counters, event IDs, and architectural behavior.

Technical Details

This PR uses material such as:

counter definition
metric formulas used in avg/min/max/value/peak fields
metric descriptions of similar metrics
MI 350 architecture documentation

Aforementioned information was fed to RAG based LLM agent which derived the metric descriptions

Changes:

Added metric descriptions to analysis config YAML files for gfx950
Updated per-architecture metric definitions (gfx950_metrics_description.yaml)
Public-facing documentation (docs/data/metrics_description.yaml) has been updated with all new metric descriptions
Regenerated delta files for older architectures (gfx941, gfx942)
Updated hash database for configuration integrity verification
All pre-commit hooks pass successfully

JIRA ID

ROCM-1126

AIPROFCOMP-9

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Add metric descriptions for gfx950 metrics that were previously missing descriptions across multiple hardware blocks: - Command Processor (CPC): 3 metrics - Workgroup Manager (SPI): 3 metrics - Instruction Mix: 2 metrics - Compute Pipeline: 3 metrics - Local Data Share (LDS): 8 metrics - TA/TD Units: 3 metrics - vL1D Cache: 18 metrics - L2 Cache: 9 metrics Public-facing documentation has also been updated with this information.

Copilot

Pull request overview

This PR adds metric descriptions for 48 previously undocumented gfx950 (AMD Instinct MI350 series) performance metrics. The additions provide essential context for users profiling and analyzing performance on the gfx950 architecture.

Changes:

Added metric descriptions across 8 hardware blocks: CPC (3 metrics), SPI (3 metrics), Instruction Mix (2 metrics), Compute Pipeline (3 metrics), LDS (8 metrics), TA/TD (3 metrics), vL1D Cache (18 metrics), and L2 Cache (9 metrics)
Updated per-architecture YAML configuration files and public-facing documentation
Regenerated configuration hash database for integrity verification

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
gfx950_metrics_description.yaml	Main per-architecture metric definitions file with 48 new metric descriptions
.config_hashes.json	Updated hash values for all modified configuration files to maintain integrity verification
1700_l2_cache.yaml	Added descriptions for 3 L2 cache metrics (Bypass Req, Input Buffer Req, Atomic-HBM, Read 128B)
1600_vector_l1_data_cache.yaml	Added descriptions for 18 vL1D cache metrics including stall metrics, Tag RAM requests, and UTCL1 metrics
1500_address_processing_unit_and_data_return_path_ta_td.yaml	Added descriptions for 3 TA/TD instruction metrics for LDS-direct operations
1200_local_data_share_lds.yaml	Added descriptions for 8 LDS metrics including bandwidth metrics, instruction counts, and FIFO full rates
1100_compute_units_compute_pipeline.yaml	Added descriptions for 3 compute pipeline metrics including F8/F6F4 ops and VALU co-issue efficiency
1000_compute_units_instruction_mix.yaml	Added descriptions for 2 instruction mix metrics (MFMA-F6F4, Spill/Stack Coalesceable)
0600_workgroup_manager_spi.yaml	Added descriptions for 3 SPI metrics including scheduler-pipe wave occupancy and utilization
0500_command_processor_cpc_cpf.yaml	Added descriptions for 3 CPC metrics including SYNC FIFO, CANE stall, and ADC utilization
metrics_description.yaml	Updated public-facing documentation with all 48 new metric descriptions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T00:59:54Z

projects/rocprofiler-compute/docs/data/metrics_description.yaml

+    rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 0, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.
+    unit: Requests per Normalization Unit
+  Tag RAM 1 Req:
+    rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 1, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.
+    unit: Requests per Normalization Unit
+  Tag RAM 2 Req:
+    rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 2, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.
+    unit: Requests per Normalization Unit
+  Tag RAM 3 Req:
+    rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 3, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.


The descriptions for Tag RAM 0-3 Req metrics state "L2 cache requests from this vL1D" which is potentially confusing. These are more accurately described as requests within the vL1D cache lookup process that need to check Tag RAM banks, not L2 cache requests. The wording could be clarified to say "vL1D cache requests that mapped to Tag RAM bank X" or "cache line lookups that mapped to Tag RAM bank X" to better reflect that these are vL1D internal operations.

Suggested change

rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 0, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 1 Req:

rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 1, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 2 Req:

rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 2, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 3 Req:

rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 3, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

rst: The total number of vL1D cache line lookup requests that mapped to Tag RAM bank 0, per normalization unit. These are internal vL1D tag lookup operations, not L2 cache requests. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 1 Req:

rst: The total number of vL1D cache line lookup requests that mapped to Tag RAM bank 1, per normalization unit. These are internal vL1D tag lookup operations, not L2 cache requests. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 2 Req:

rst: The total number of vL1D cache line lookup requests that mapped to Tag RAM bank 2, per normalization unit. These are internal vL1D tag lookup operations, not L2 cache requests. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 3 Req:

rst: The total number of vL1D cache line lookup requests that mapped to Tag RAM bank 3, per normalization unit. These are internal vL1D tag lookup operations, not L2 cache requests. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

Copilot · 2026-01-27T00:59:54Z

projects/rocprofiler-compute/tools/per_arch_metric_definitions/gfx950_metrics_description.yaml

+    rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 0, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.
+    unit: Requests per Normalization Unit
+  Tag RAM 1 Req:
+    rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 1, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.
+    unit: Requests per Normalization Unit
+  Tag RAM 2 Req:
+    rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 2, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.
+    unit: Requests per Normalization Unit
+  Tag RAM 3 Req:
+    rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 3, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.


The descriptions for Tag RAM 0-3 Req metrics state "L2 cache requests from this vL1D" which is potentially confusing. These are more accurately described as requests within the vL1D cache lookup process that need to check Tag RAM banks, not L2 cache requests. The wording could be clarified to say "vL1D cache requests that mapped to Tag RAM bank X" or "cache line lookups that mapped to Tag RAM bank X" to better reflect that these are vL1D internal operations.

Suggested change

rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 0, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 1 Req:

rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 1, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 2 Req:

rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 2, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 3 Req:

rst: The total number of L2 cache requests from this vL1D that mapped to Tag RAM bank 3, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

rst: The total number of vL1D cache requests (cache line lookups) that mapped to Tag RAM bank 0, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 1 Req:

rst: The total number of vL1D cache requests (cache line lookups) that mapped to Tag RAM bank 1, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 2 Req:

rst: The total number of vL1D cache requests (cache line lookups) that mapped to Tag RAM bank 2, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

unit: Requests per Normalization Unit

Tag RAM 3 Req:

rst: The total number of vL1D cache requests (cache line lookups) that mapped to Tag RAM bank 3, per normalization unit. The vL1D cache uses multiple Tag RAM banks for parallel tag lookups. Distribution across banks affects lookup efficiency.

Copilot · 2026-01-27T00:59:54Z

...filer-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1600_vector_l1_data_cache.yaml

+      The total number of L2 cache requests from this vL1D that mapped to Tag RAM
+      bank 0, per normalization unit. The vL1D cache uses multiple Tag RAM banks
+      for parallel tag lookups. Distribution across banks affects lookup efficiency.
+    Tag RAM 1 Req: >-
+      The total number of L2 cache requests from this vL1D that mapped to Tag RAM
+      bank 1, per normalization unit. The vL1D cache uses multiple Tag RAM banks
+      for parallel tag lookups. Distribution across banks affects lookup efficiency.
+    Tag RAM 2 Req: >-
+      The total number of L2 cache requests from this vL1D that mapped to Tag RAM
+      bank 2, per normalization unit. The vL1D cache uses multiple Tag RAM banks
+      for parallel tag lookups. Distribution across banks affects lookup efficiency.
+    Tag RAM 3 Req: >-
+      The total number of L2 cache requests from this vL1D that mapped to Tag RAM
+      bank 3, per normalization unit. The vL1D cache uses multiple Tag RAM banks
+      for parallel tag lookups. Distribution across banks affects lookup efficiency.


The descriptions for Tag RAM 0-3 Req metrics state "L2 cache requests from this vL1D" which is potentially confusing. These are more accurately described as requests within the vL1D cache lookup process that need to check Tag RAM banks, not L2 cache requests. The wording could be clarified to say "vL1D cache requests that mapped to Tag RAM bank X" or "cache line lookups that mapped to Tag RAM bank X" to better reflect that these are vL1D internal operations.

Suggested change

The total number of L2 cache requests from this vL1D that mapped to Tag RAM

bank 0, per normalization unit. The vL1D cache uses multiple Tag RAM banks

for parallel tag lookups. Distribution across banks affects lookup efficiency.

Tag RAM 1 Req: >-

The total number of L2 cache requests from this vL1D that mapped to Tag RAM

bank 1, per normalization unit. The vL1D cache uses multiple Tag RAM banks

for parallel tag lookups. Distribution across banks affects lookup efficiency.

Tag RAM 2 Req: >-

The total number of L2 cache requests from this vL1D that mapped to Tag RAM

bank 2, per normalization unit. The vL1D cache uses multiple Tag RAM banks

for parallel tag lookups. Distribution across banks affects lookup efficiency.

Tag RAM 3 Req: >-

The total number of L2 cache requests from this vL1D that mapped to Tag RAM

bank 3, per normalization unit. The vL1D cache uses multiple Tag RAM banks

for parallel tag lookups. Distribution across banks affects lookup efficiency.

The total number of vL1D cache requests (cache line lookups) that mapped to

Tag RAM bank 0 during the vL1D tag lookup process, per normalization unit.

The vL1D cache uses multiple Tag RAM banks for parallel tag lookups.

Distribution across banks affects lookup efficiency.

Tag RAM 1 Req: >-

The total number of vL1D cache requests (cache line lookups) that mapped to

Tag RAM bank 1 during the vL1D tag lookup process, per normalization unit.

The vL1D cache uses multiple Tag RAM banks for parallel tag lookups.

Distribution across banks affects lookup efficiency.

Tag RAM 2 Req: >-

The total number of vL1D cache requests (cache line lookups) that mapped to

Tag RAM bank 2 during the vL1D tag lookup process, per normalization unit.

The vL1D cache uses multiple Tag RAM banks for parallel tag lookups.

Distribution across banks affects lookup efficiency.

Tag RAM 3 Req: >-

The total number of vL1D cache requests (cache line lookups) that mapped to

Tag RAM bank 3 during the vL1D tag lookup process, per normalization unit.

The vL1D cache uses multiple Tag RAM banks for parallel tag lookups.

Distribution across banks affects lookup efficiency.

Copilot AI review requested due to automatic review settings January 27, 2026 00:55

vedithal-amd requested review from a team and prbasyal-amd as code owners January 27, 2026 00:55

github-actions bot added documentation Improvements or additions to documentation project: rocprofiler-compute labels Jan 27, 2026

Copilot started reviewing on behalf of vedithal-amd January 27, 2026 00:56 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

vedithal-amd changed the title ~~[rocprofiler-compute] Add metric descriptions for missing gfx950 metrics~~ [rocprofiler-compute] [Documentation] Add metric descriptions for missing gfx950 metrics Jan 27, 2026

Update CHANGELOG

d925c53

systems-assistant bot added the organization: ROCm label Jan 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rocprofiler-compute] [Documentation] Add metric descriptions for missing gfx950 metrics #2874

[rocprofiler-compute] [Documentation] Add metric descriptions for missing gfx950 metrics #2874

vedithal-amd commented Jan 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[rocprofiler-compute] [Documentation] Add metric descriptions for missing gfx950 metrics #2874

Are you sure you want to change the base?

[rocprofiler-compute] [Documentation] Add metric descriptions for missing gfx950 metrics #2874

Conversation

vedithal-amd commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

JIRA ID

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vedithal-amd commented Jan 27, 2026 •

edited

Loading