-
Notifications
You must be signed in to change notification settings - Fork 77
Open
Labels
Milestone
Description
cudaMemcpyAsync follows standard stream semantics so is guaranteed to complete before any subsequent kernel launches or synchronisation points (e.g. a synchronous memcpy to host). I need to think a little more about this to be sure but, I think this means it would be safe to switch to cudaMemcpyAsync for all pushXXXToDevice operations which should reduce synchronisation overhead when streaming data from host->device significantly.
Furthermore, allocateMem and freeMem could almost certainly be sped up by using cudaMallocAsync and cudaFreeAsync (with a barrier at the end of the functions for safety)
Reactions are currently unavailable