Deprecate the `DeviceNDArray` class and public APIs that return instances #546

brandon-b-miller · 2025-10-24T13:30:34Z

Part of #471

Adds a DeprecatedNDArrayAPIWarning emitted from all user facing functions for moving data around (cuda.to_device, driver.host_to_device, device_to_host, also as_cuda_array, is_cuda_array, etc
Separates existing now deprecated APIs into internal non-warning versions and external warning versions
Adds a deprecation warning to the DeviceNDArray ctor
Adds DeviceNDArray._create_nowarn
Removes as many usages of the deprecated APIs as possible from the test suite in favor of cupy arrays
Catches warnings for tests of the currently exposed and now deprecated APIs
Where absolutely necessary, tests calls internal non-warning versions of the deprecated APIs
Rework tests to not use these apis as much as possible

copy-pr-bot · 2025-10-24T13:30:37Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cpcloud · 2025-10-24T15:04:39Z

numba_cuda/numba/cuda/cudadrv/devicearray.py

+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        warnings.warn(
+            f"{func.__name__} api is deprecated. Please prefer cupy for array functions",


cupy arrays are much slower than DeviceNDArray because they require creating an external (i.e., non-numba-cuda-created) stream, so I'm not sure a recommendation for that is what we should do right now.

I was thinking that we can keep the top-level APIs (device_array etc.) and replace their internals with StridedMemoryView or something similar, in an effort to allow folks to as-cheaply-as-possible construct arrays.

Here's the current state of the art:

I concur that a light weight device array like container should exist, I'm just not sure that numba-cuda should necessarily be the library providing it publicly. I think we should nudge users away from using numba-cuda as such, like for moving data from host to device. That said, I'm open to suggestions on what we should recommend.

brandon-b-miller · 2026-01-12T15:35:44Z

/ok to test

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T15:37:14Z

numba_cuda/numba/cuda/api.py

+def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):
    shape, strides, dtype = prepare_shape_strides_dtype(
        shape, strides, dtype, order
    )
-    return devicearray.DeviceNDArray(
+    return devicearray.DeviceNDArray._create_nowarn(
        shape=shape, strides=strides, dtype=dtype, stream=stream
    )


[P0] The internal _device_array function is missing the @require_context decorator. This is inconsistent with the same function in _api.py (line 143) which has the decorator.

Since DeviceNDArray._create_nowarn requires an active CUDA context (it calls devices.get_context().memalloc() when allocating memory), calling this function without an active context will cause a runtime error. This function is used internally by the public device_array function which has the decorator via the outer function, but _device_array itself can be called directly from other modules (e.g., from cuda.kernels.reduction, cuda.vectorizers, cuda.random, etc.).

Suggested change

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

shape, strides, dtype = prepare_shape_strides_dtype(

shape, strides, dtype, order

)

return devicearray.DeviceNDArray(

return devicearray.DeviceNDArray._create_nowarn(

shape=shape, strides=strides, dtype=dtype, stream=stream

)

@require_context

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

shape, strides, dtype = prepare_shape_strides_dtype(

shape, strides, dtype, order

)

return devicearray.DeviceNDArray._create_nowarn(

shape=shape, strides=strides, dtype=dtype, stream=stream

)

greptile-apps · 2026-01-12T15:37:15Z

numba_cuda/numba/cuda/kernels/reduction.py

+            cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[
+                :1
+            ].copy_to_device(partials[:1], stream=stream)


[P2] Unnecessary complexity: converting res through __cuda_array_interface__ when it's already sliceable.

The res parameter is a device array that already supports slicing operations. The old code simply used res[:1] which worked correctly. Converting it through _from_cuda_array_interface(res.__cuda_array_interface__) adds unnecessary overhead and complexity without any functional benefit.

Since res implements the CUDA array interface, it can be sliced directly. The [:1] operation will work on any object that implements __getitem__ properly, including DeviceNDArray and other CUDA-compatible arrays.

Suggested change

cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[

:1

].copy_to_device(partials[:1], stream=stream)

res[:1].copy_to_device(partials[:1], stream=stream)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T15:38:14Z

numba_cuda/numba/cuda/api.py

+    )
+
+
+def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):


[P0] Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which internally calls devices.get_context().memalloc() (line 123 in devicearray.py), requiring an active CUDA context. Without this decorator, the function will fail with a context error when called without an active context. The equivalent function in _api.py (line 143) correctly has this decorator.

greptile-apps · 2026-01-12T15:38:15Z

numba_cuda/numba/cuda/kernels/reduction.py

+            cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[
+                :1
+            ].copy_to_device(partials[:1], stream=stream)


[P2] Unnecessary complexity. res is already a sliceable device array (CuPy array as shown in tests), so res[:1] works directly. Converting through __cuda_array_interface__ and _from_cuda_array_interface() adds overhead without benefit. The original code res[:1].copy_to_device(partials[:1], stream=stream) was simpler and equivalent.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

brandon-b-miller · 2026-01-20T18:58:13Z

/ok to test

greptile-apps

_{75 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-20T19:03:13Z

numba_cuda/numba/cuda/api.py

+def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):
    shape, strides, dtype = prepare_shape_strides_dtype(
        shape, strides, dtype, order
    )
-    return devicearray.DeviceNDArray(
+    return devicearray.DeviceNDArray._create_nowarn(
        shape=shape, strides=strides, dtype=dtype, stream=stream
    )


logic: Missing @require_context decorator. The function calls DeviceNDArray._create_nowarn() which requires an active CUDA context (allocates memory via devices.get_context().memalloc()). The equivalent function in _api.py (line 143) has this decorator.

brandon-b-miller · 2026-01-20T20:15:56Z

/ok to test

brandon-b-miller · 2026-01-20T21:01:32Z

/ok to test

greptile-apps

_{79 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-20T21:04:24Z

numba_cuda/numba/cuda/api.py

+    )
+
+
+def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):


syntax: Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which allocates memory via devices.get_context().memalloc() (devicearray.py:123), requiring an active CUDA context. The equivalent function in _api.py (line 143) correctly has this decorator.

Suggested change

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

@require_context

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

greptile-apps · 2026-01-20T21:04:25Z

numba_cuda/numba/cuda/kernels/reduction.py

+            cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[
+                :1
+            ].copy_to_device(partials[:1], stream=stream)


style: Unnecessary complexity. res is already a sliceable device array (supports __cuda_array_interface__), so converting through _from_cuda_array_interface() adds overhead without benefit. Direct slicing works:

Suggested change

cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[

:1

].copy_to_device(partials[:1], stream=stream)

res[:1].copy_to_device(partials[:1], stream=stream)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

brandon-b-miller · 2026-01-27T19:35:32Z

/ok to test

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-27T19:38:27Z

numba_cuda/numba/cuda/api.py

+    )
+
+
+def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):


Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which allocates memory via devices.get_context().memalloc(), requiring an active CUDA context. The equivalent function in _api.py (line 143) has this decorator.

Suggested change

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

@require_context

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

greptile-apps · 2026-01-27T19:38:28Z

numba_cuda/numba/cuda/kernels/reduction.py

+            cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[
+                :1
+            ].copy_to_device(partials[:1], stream=stream)


Unnecessary complexity. res is already a sliceable device array (CuPy array in tests), so direct slicing works:

res[:1].copy_to_device(partials[:1], stream=stream)

Converting through __cuda_array_interface__ and _from_cuda_array_interface() adds overhead without benefit.

brandon-b-miller · 2026-01-27T20:44:43Z

/ok to test

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-27T20:47:40Z

numba_cuda/numba/cuda/api.py

+def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):
    shape, strides, dtype = prepare_shape_strides_dtype(
        shape, strides, dtype, order
    )
-    return devicearray.DeviceNDArray(
+    return devicearray.DeviceNDArray._create_nowarn(
        shape=shape, strides=strides, dtype=dtype, stream=stream
    )


Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which requires an active CUDA context (allocates memory via devices.get_context().memalloc() at devicearray.py:126). The equivalent function in _api.py:143 correctly has this decorator.

Suggested change

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

shape, strides, dtype = prepare_shape_strides_dtype(

shape, strides, dtype, order

)

return devicearray.DeviceNDArray(

return devicearray.DeviceNDArray._create_nowarn(

shape=shape, strides=strides, dtype=dtype, stream=stream

)

@require_context

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

brandon-b-miller · 2026-01-27T20:55:20Z

/ok to test

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

brandon-b-miller · 2026-01-27T23:26:15Z

/ok to test

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-27T23:29:13Z

numba_cuda/numba/cuda/api.py

+    )
+
+
+def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):


Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which allocates memory via devices.get_context().memalloc() (devicearray.py:126), requiring an active CUDA context. The equivalent function in _api.py:143 has this decorator.

greptile-apps · 2026-01-27T23:29:14Z

numba_cuda/numba/cuda/kernels/reduction.py

+            cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[
+                :1
+            ].copy_to_device(partials[:1], stream=stream)


Unnecessary complexity. res already supports slicing (it implements __cuda_array_interface__). Direct slicing works and is simpler: res[:1].copy_to_device(partials[:1], stream=stream)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

initial

a58f928

cpcloud reviewed Oct 24, 2025

View reviewed changes

gmarkall added the 2 - In Progress Currently a work in progress label Oct 24, 2025

rparolin added this to the next milestone Oct 24, 2025

brandon-b-miller mentioned this pull request Oct 27, 2025

Handle cuda.core.Stream in driver operations #401

Merged

brandon-b-miller added 24 commits October 27, 2025 15:18

Merge branch 'main' into deprecate-host-array-api

db335c1

Merge branch 'main' into deprecate-host-array-api

601eec4

progress replacing tests

762a6b1

progress

2081572

Merge branch 'main' into deprecate-host-array-api

8377953

clean

edf413d

Merge branch 'main' into deprecate-host-array-api

bb92fec

more clean

6e08f80

working through more test cases

50683d0

working out class relationships

e44516b

Merge branch 'main' into deprecate-host-array-api

85d5149

partially switch designs

58d716c

merge/progress

7872521

partial

ec5c175

more progress

e06ce49

fix a few more tests

ef6860a

even more tests

238052b

Merge branch 'main' into deprecate-host-array-api

c2d1f25

fix blackscholes test

1c69776

fix test_gufunc_arg

ec67eb5

tests

40a89a8

fix remaining tests

bec33c4

merge/resolve

da9a7e5

fix new test failures

b8f3790

Merge branch 'main' into deprecate-host-array-api

c269001

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

brandon-b-miller added 3 commits January 12, 2026 07:51

fix simulator

2db2d21

update bfloat16 tests

ec053d1

merge/resolve

f07b744

greptile-apps bot reviewed Jan 20, 2026

View reviewed changes

brandon-b-miller added 2 commits January 20, 2026 11:53

test cu-CUDA_MAJOR in run-tests

0ae990d

small fixes

b4c03ad

more fixes

733f8b7

greptile-apps bot reviewed Jan 20, 2026

View reviewed changes

brandon-b-miller added 3 commits January 23, 2026 10:15

remove tests

b70bbd9

Merge remote-tracking branch 'upstream/main'

aadff07

resolve

55babb4

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

brandon-b-miller added 2 commits January 27, 2026 11:53

add cupy to test environments

0278757

fix several tests

9021311

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

attempt to fix simulator

23cafef

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

fix more tests

f317bb1

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

		)


		def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

	def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):
	@require_context
	def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

Deprecate the DeviceNDArray class and public APIs that return instances #546

Are you sure you want to change the base?

Deprecate the DeviceNDArray class and public APIs that return instances #546

Uh oh!

Conversation

brandon-b-miller commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Oct 24, 2025

Uh oh!

cpcloud Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

cpcloud Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Jan 20, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Jan 20, 2026

Uh oh!

brandon-b-miller commented Jan 20, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Jan 27, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Jan 27, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Jan 27, 2026

Deprecate the `DeviceNDArray` class and public APIs that return instances #546

Deprecate the `DeviceNDArray` class and public APIs that return instances #546

brandon-b-miller commented Oct 24, 2025 •

edited

Loading