Skip to content

Commit ad52f60

Browse files
committed
nvmeof: treat "connecting" state as valid in path detection
When checking if a path to a gateway already exists, treat both "live" and "connecting" states as valid connections that should not be re-attempted. The "connecting" state indicates the NVMe kernel is actively trying to establish or re-establish a connection, which occurs in scenarios like: - Initial connection establishment - Gateway temporarily unavailable and kernel retrying - Subsystem deleted and recreated on the gateway The kernel's ctrl_loss_tmo mechanism will continue retry attempts for up to 30 minutes ( by -l param in nvme connect command). Attempting nvme connect while a path is in "connecting" state results in "already connected" errors and can cause volume attachment failures during create/delete cycles. By treating "connecting" as a valid state, we allow the kernel's retry logic to handle reconnection automatically without interference. Signed-off-by: gadi-didi <gadi.didi@ibm.com>
1 parent 80c0474 commit ad52f60

File tree

1 file changed

+16
-1
lines changed

1 file changed

+16
-1
lines changed

internal/nvmeof/nvmeof_initiator.go

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -261,10 +261,25 @@ func (nhc nvmeHostConnections) hasLivePathToGateway(subsystemNQN, hostNQN,
261261
continue
262262
}
263263

264+
// loop through paths to find matching path
264265
for _, path := range subsys.Paths {
266+
// Check if the path matches the gateway IP and port
267+
// and is in a usable state:
268+
// - "live": connection is active and working
269+
// - "connecting": kernel is actively trying to (re)connect
270+
//
271+
// The "connecting" state occurs when:
272+
// 1. Initial connection is being established
273+
// 2. Connection lost and kernel is retrying (ctrl_loss_tmo in effect)
274+
// 3. Subsystem was deleted/recreated on the gateway
275+
//
276+
// In all cases, the kernel's retry mechanism handles reconnection
277+
// for up to ctrl_loss_tmo seconds, so we should not attempt another
278+
// connection which would fail with "already connected" error.
265279
if path.Address.Traddr == gatewayIP &&
266280
path.Address.Trsvcid == gatewayPort &&
267-
path.State == "live" {
281+
(path.State == "live" ||
282+
path.State == "connecting") {
268283
return true
269284
}
270285
}

0 commit comments

Comments
 (0)