Skip to content

Conversation

@Daoming-Chen
Copy link

Summary

This PR optimizes the PseudoInverseIK solver performance through several targeted improvements:

  • Eliminate redundant FK computation: The calc_jacobian function was computing forward kinematics twice per iteration. Now reuses the accumulated transform from the Jacobian loop.
  • Cache regularization matrix: Pre-compute reg = regularization * torch.eye(6) once in __init__ instead of every iteration.
  • Add torch.compile support: Optional JIT compilation for PyTorch 2.0+ via use_compile=True parameter.
  • Reduce memory allocations: Pre-allocate delta pose buffer and use in-place tensor operations where safe.

Performance Results

Benchmarked on UR5 robot with 500 goals, 10 retries, 30 max iterations:

Device Before After Speedup
CPU 472.43 ms 354.35 ms 1.33x faster
CUDA 172.30 ms 104.88 ms 1.64x faster
  • No regression in convergence rate (~79% on CPU, ~78% on CUDA)
  • No change in solution quality

API Changes

New optional parameter for PseudoInverseIK:

ik = pk.PseudoInverseIK(chain, use_compile=True, ...)  # Enable torch.compile (PyTorch 2.0+)

Test plan

  • All existing tests pass (pytest tests/)
  • Benchmarked on CPU and CUDA
  • Verified convergence rate unchanged
  • Verified solution quality unchanged (< 1e-6 difference)

…on improvements

Key optimizations:
- Add torch.compile support for compute_dq kernel (PyTorch 2.0+)
- Pre-allocate dx buffer in delta_pose to reduce tensor concatenation
- Use in-place tensor addition for q updates
- Cache regularization matrix in __init__ instead of creating each iteration
- Eliminate redundant FK call in jacobian calculation

Performance improvements (UR5, 500 goals, 10 retries, 30 iterations):
- CPU: 1.33x faster (472ms → 354ms)
- CUDA: 1.64x faster (172ms → 105ms)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants