Skip to content

Conversation

@AidanBeltonS
Copy link
Contributor

Motivation

The current CI occasionally is flakey with failures due to UB, and correctness issues. This change addresses some of those issues in the memory tests

Technical Details

hip_test_checkers.hh
* indexing with signed integer, size_t is a safer choice
hip_test_common.hh
* Added ceiling division function, for kernel size calculations
memcpy3d_tests_common.hh
* Use RAII class to handle peer access of device memory
hipArrayCommon.hh
* Add guard to prevent out of bounds memory access
hipArrayCreate.cc
* std::iota with signed char, can easily overflow.
* TODO: Change approach to this, as fix isn't really a fix
* Alter kernel launch to better handle edge cases
hipArrayGetDescriptor.cc
* Multi-threaded test case now updated passing/failing value atomically and safely
* Handle missing default case by issuing warning and ending test
* funcToChkArray updates to handle 2D case, as we pass 2D array in
* Properly run Unit_hipArrayGetDescriptor_Host2Array_Array2Host for 2D cases
* Handle potential double free, when testing negative parameters
hipArrayGetInfo.cc
* Double free: Underlying pointer was of ArrayAllocGuard gets released, but object isn't aware memory has been released.
hipDeviceGetMemPool.cc && hipDeviceSetMemPool.cc
* Reset the default mem pool after its attributes have been altered
hipFreeArray.cc
* accessing array elements by value rather than reference
memcpy2d_tests_common.hh
* Handle setting and unsetting device peer access
mempool_common.hh
* Add RAII class for default mem pool
* Handle potential zero division, caused by start time. Unlikely case, but worth handling non the less

JIRA ID

Partial solution: ROCM-1617

Test Plan

NA

Submission Checklist

@AidanBeltonS AidanBeltonS requested a review from a team as a code owner January 26, 2026 15:34
Copilot AI review requested due to automatic review settings January 26, 2026 15:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR resolves memory-related undefined behavior and correctness issues in HIP memory tests, specifically addressing CI flakiness. The changes introduce RAII-based resource management, fix multi-threading safety issues, prevent memory access violations, and correct integer overflow/underflow problems.

Changes:

  • Introduced RAII classes (MemPoolAttrRestore, PeerAccessGuard) for automatic resource cleanup and memory pool attribute restoration
  • Fixed thread-safety issues in multi-threaded tests by using atomic operations for shared state variables
  • Corrected array indexing and iteration bounds to prevent out-of-bounds access and handle edge cases in kernel launches
  • Resolved potential integer overflow in std::iota with signed char and division by zero in timing calculations

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
mempool_common.hh Added RAII class for mem pool attribute restoration and zero-division guards in timing kernels
hipFreeArray.cc Fixed iterator to use reference instead of value copy for thread-safe array operations
hipDeviceSetMemPool.cc Added mem pool attribute restoration using RAII
hipDeviceGetMemPool.cc Added mem pool attribute restoration using RAII (two instances)
hipArrayGetInfo.cc Fixed potential double-free by manual memory management and prevented use-after-free
hipArrayGetDescriptor.cc Fixed multi-threading safety with atomic variables and corrected 2D array test bounds
hipArrayCreate.cc Fixed signed char overflow in std::iota and improved kernel launch grid calculation
hipArrayCommon.hh Added bounds checking to prevent out-of-bounds memory access in kernel
memcpy3d_tests_common.hh Introduced RAII-based peer access management
hip_test_checkers.hh Changed offset calculation from signed to unsigned integer type

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants