Skip to content

Use async memcpy for copying to device #580

@neworderofjamie

Description

@neworderofjamie

cudaMemcpyAsync follows standard stream semantics so is guaranteed to complete before any subsequent kernel launches or synchronisation points (e.g. a synchronous memcpy to host). I need to think a little more about this to be sure but, I think this means it would be safe to switch to cudaMemcpyAsync for all pushXXXToDevice operations which should reduce synchronisation overhead when streaming data from host->device significantly.

Furthermore, allocateMem and freeMem could almost certainly be sped up by using cudaMallocAsync and cudaFreeAsync (with a barrier at the end of the functions for safety)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions