Skip to content

Conversation

@dkeven
Copy link
Member

@dkeven dkeven commented Jan 5, 2026

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently, a device marked unhealthy is never recovered, for rare cases where it's just a NVML connection error, this is unacceptable.

@dkeven dkeven merged commit 7977110 into feat/nvshare Jan 5, 2026
1 check passed
@dkeven dkeven deleted the fix/device_recovery branch January 5, 2026 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants