-
Notifications
You must be signed in to change notification settings - Fork 55
Deprecate the DeviceNDArray class and public APIs that return instances
#546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Deprecate the DeviceNDArray class and public APIs that return instances
#546
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| @functools.wraps(func) | ||
| def wrapper(*args, **kwargs): | ||
| warnings.warn( | ||
| f"{func.__name__} api is deprecated. Please prefer cupy for array functions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cupy arrays are much slower than DeviceNDArray because they require creating an external (i.e., non-numba-cuda-created) stream, so I'm not sure a recommendation for that is what we should do right now.
I was thinking that we can keep the top-level APIs (device_array etc.) and replace their internals with StridedMemoryView or something similar, in an effort to allow folks to as-cheaply-as-possible construct arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur that a light weight device array like container should exist, I'm just not sure that numba-cuda should necessarily be the library providing it publicly. I think we should nudge users away from using numba-cuda as such, like for moving data from host to device. That said, I'm open to suggestions on what we should recommend.
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | ||
| shape, strides, dtype = prepare_shape_strides_dtype( | ||
| shape, strides, dtype, order | ||
| ) | ||
| return devicearray.DeviceNDArray( | ||
| return devicearray.DeviceNDArray._create_nowarn( | ||
| shape=shape, strides=strides, dtype=dtype, stream=stream | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P0] The internal _device_array function is missing the @require_context decorator. This is inconsistent with the same function in _api.py (line 143) which has the decorator.
Since DeviceNDArray._create_nowarn requires an active CUDA context (it calls devices.get_context().memalloc() when allocating memory), calling this function without an active context will cause a runtime error. This function is used internally by the public device_array function which has the decorator via the outer function, but _device_array itself can be called directly from other modules (e.g., from cuda.kernels.reduction, cuda.vectorizers, cuda.random, etc.).
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | |
| shape, strides, dtype = prepare_shape_strides_dtype( | |
| shape, strides, dtype, order | |
| ) | |
| return devicearray.DeviceNDArray( | |
| return devicearray.DeviceNDArray._create_nowarn( | |
| shape=shape, strides=strides, dtype=dtype, stream=stream | |
| ) | |
| @require_context | |
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | |
| shape, strides, dtype = prepare_shape_strides_dtype( | |
| shape, strides, dtype, order | |
| ) | |
| return devicearray.DeviceNDArray._create_nowarn( | |
| shape=shape, strides=strides, dtype=dtype, stream=stream | |
| ) |
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | ||
| :1 | ||
| ].copy_to_device(partials[:1], stream=stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] Unnecessary complexity: converting res through __cuda_array_interface__ when it's already sliceable.
The res parameter is a device array that already supports slicing operations. The old code simply used res[:1] which worked correctly. Converting it through _from_cuda_array_interface(res.__cuda_array_interface__) adds unnecessary overhead and complexity without any functional benefit.
Since res implements the CUDA array interface, it can be sliced directly. The [:1] operation will work on any object that implements __getitem__ properly, including DeviceNDArray and other CUDA-compatible arrays.
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | |
| :1 | |
| ].copy_to_device(partials[:1], stream=stream) | |
| res[:1].copy_to_device(partials[:1], stream=stream) |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| ) | ||
|
|
||
|
|
||
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P0] Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which internally calls devices.get_context().memalloc() (line 123 in devicearray.py), requiring an active CUDA context. Without this decorator, the function will fail with a context error when called without an active context. The equivalent function in _api.py (line 143) correctly has this decorator.
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | ||
| :1 | ||
| ].copy_to_device(partials[:1], stream=stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] Unnecessary complexity. res is already a sliceable device array (CuPy array as shown in tests), so res[:1] works directly. Converting through __cuda_array_interface__ and _from_cuda_array_interface() adds overhead without benefit. The original code res[:1].copy_to_device(partials[:1], stream=stream) was simpler and equivalent.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
75 files reviewed, 1 comment
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | ||
| shape, strides, dtype = prepare_shape_strides_dtype( | ||
| shape, strides, dtype, order | ||
| ) | ||
| return devicearray.DeviceNDArray( | ||
| return devicearray.DeviceNDArray._create_nowarn( | ||
| shape=shape, strides=strides, dtype=dtype, stream=stream | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Missing @require_context decorator. The function calls DeviceNDArray._create_nowarn() which requires an active CUDA context (allocates memory via devices.get_context().memalloc()). The equivalent function in _api.py (line 143) has this decorator.
|
/ok to test |
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
79 files reviewed, 2 comments
| ) | ||
|
|
||
|
|
||
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which allocates memory via devices.get_context().memalloc() (devicearray.py:123), requiring an active CUDA context. The equivalent function in _api.py (line 143) correctly has this decorator.
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | |
| @require_context | |
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): |
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | ||
| :1 | ||
| ].copy_to_device(partials[:1], stream=stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Unnecessary complexity. res is already a sliceable device array (supports __cuda_array_interface__), so converting through _from_cuda_array_interface() adds overhead without benefit. Direct slicing works:
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | |
| :1 | |
| ].copy_to_device(partials[:1], stream=stream) | |
| res[:1].copy_to_device(partials[:1], stream=stream) |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| ) | ||
|
|
||
|
|
||
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which allocates memory via devices.get_context().memalloc(), requiring an active CUDA context. The equivalent function in _api.py (line 143) has this decorator.
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | |
| @require_context | |
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): |
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | ||
| :1 | ||
| ].copy_to_device(partials[:1], stream=stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary complexity. res is already a sliceable device array (CuPy array in tests), so direct slicing works:
res[:1].copy_to_device(partials[:1], stream=stream)
Converting through __cuda_array_interface__ and _from_cuda_array_interface() adds overhead without benefit.
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | ||
| shape, strides, dtype = prepare_shape_strides_dtype( | ||
| shape, strides, dtype, order | ||
| ) | ||
| return devicearray.DeviceNDArray( | ||
| return devicearray.DeviceNDArray._create_nowarn( | ||
| shape=shape, strides=strides, dtype=dtype, stream=stream | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which requires an active CUDA context (allocates memory via devices.get_context().memalloc() at devicearray.py:126). The equivalent function in _api.py:143 correctly has this decorator.
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | |
| shape, strides, dtype = prepare_shape_strides_dtype( | |
| shape, strides, dtype, order | |
| ) | |
| return devicearray.DeviceNDArray( | |
| return devicearray.DeviceNDArray._create_nowarn( | |
| shape=shape, strides=strides, dtype=dtype, stream=stream | |
| ) | |
| @require_context | |
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): |
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| ) | ||
|
|
||
|
|
||
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which allocates memory via devices.get_context().memalloc() (devicearray.py:126), requiring an active CUDA context. The equivalent function in _api.py:143 has this decorator.
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | ||
| :1 | ||
| ].copy_to_device(partials[:1], stream=stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary complexity. res already supports slicing (it implements __cuda_array_interface__). Direct slicing works and is simpler: res[:1].copy_to_device(partials[:1], stream=stream)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Part of #471
DeprecatedNDArrayAPIWarningemitted from all user facing functions for moving data around (cuda.to_device, driver.host_to_device, device_to_host, also as_cuda_array, is_cuda_array, etcDeviceNDArrayctorDeviceNDArray._create_nowarn