Use async memcpy for copying to device

``cudaMemcpyAsync`` follows standard stream semantics so is guaranteed to complete before any subsequent kernel launches or synchronisation points (e.g. a synchronous memcpy to host). I need to think a little more about this to be sure but, I *think* this means it would be safe to switch to ``cudaMemcpyAsync`` for all ``pushXXXToDevice`` operations which should reduce synchronisation overhead when streaming data from host->device significantly.

Furthermore, ``allocateMem`` and ``freeMem`` could almost certainly be sped up by using  ``cudaMallocAsync`` and ``cudaFreeAsync`` (with a barrier at the end of the functions for safety)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use async memcpy for copying to device #580

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use async memcpy for copying to device #580

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions