ROCM-1617 - Resolve hip mem array and some mempool issues #2860
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
The current CI occasionally is flakey with failures due to UB, and correctness issues. This change addresses some of those issues in the memory tests
Technical Details
hip_test_checkers.hh
* indexing with signed integer, size_t is a safer choice
hip_test_common.hh
* Added ceiling division function, for kernel size calculations
memcpy3d_tests_common.hh
* Use RAII class to handle peer access of device memory
hipArrayCommon.hh
* Add guard to prevent out of bounds memory access
hipArrayCreate.cc
* std::iota with signed char, can easily overflow.
* TODO: Change approach to this, as fix isn't really a fix
* Alter kernel launch to better handle edge cases
hipArrayGetDescriptor.cc
* Multi-threaded test case now updated passing/failing value atomically and safely
* Handle missing default case by issuing warning and ending test
* funcToChkArray updates to handle 2D case, as we pass 2D array in
* Properly run Unit_hipArrayGetDescriptor_Host2Array_Array2Host for 2D cases
* Handle potential double free, when testing negative parameters
hipArrayGetInfo.cc
* Double free: Underlying pointer was of ArrayAllocGuard gets released, but object isn't aware memory has been released.
hipDeviceGetMemPool.cc && hipDeviceSetMemPool.cc
* Reset the default mem pool after its attributes have been altered
hipFreeArray.cc
* accessing array elements by value rather than reference
memcpy2d_tests_common.hh
* Handle setting and unsetting device peer access
mempool_common.hh
* Add RAII class for default mem pool
* Handle potential zero division, caused by start time. Unlikely case, but worth handling non the less
JIRA ID
Partial solution: ROCM-1617
Test Plan
NA
Submission Checklist