Skip to content

nvmeof: treat "connecting" state as valid in path detection#5974

Merged
mergify[bot] merged 1 commit intoceph:develfrom
gadididi:nvmeof/fix_nodesever_already_connected
Jan 27, 2026
Merged

nvmeof: treat "connecting" state as valid in path detection#5974
mergify[bot] merged 1 commit intoceph:develfrom
gadididi:nvmeof/fix_nodesever_already_connected

Conversation

@gadididi
Copy link
Contributor

@gadididi gadididi commented Jan 26, 2026

When checking if a path to a gateway already exists, treat both "live" and "connecting" states as
valid connections that should not be re-attempted.

The "connecting" state indicates the NVMe kernel
is actively trying to establish or re-establish
a connection, which occurs in scenarios like:

  • Initial connection establishment
  • Gateway temporarily unavailable and kernel retrying
  • Subsystem\Host deleted and recreated on the gateway

The kernel's ctrl_loss_tmo mechanism will continue retry attempts for up to 30 minutes
( by -l 1800 param in nvme connect command).
Attempting nvme connect while a path is in
"connecting" state results in "already connected"
errors and can cause volume attachment failures
during create/delete cycles.

By treating "connecting" as a valid state,
we allow the kernel's retry logic to handle
reconnection automatically without interference.

Related issues

Fixes: #5964

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@gadididi gadididi requested review from a team and nixpanic January 26, 2026 12:38
@gadididi gadididi self-assigned this Jan 26, 2026
@gadididi gadididi added bug Something isn't working component/nvme-of Issues and PRs related to NVMe-oF. labels Jan 26, 2026
When checking if a path to a gateway already exists,
treat both "live" and "connecting" states as
valid connections that should not be re-attempted.

The "connecting" state indicates the NVMe kernel
is actively trying to establish or re-establish
a connection, which occurs in scenarios like:

- Initial connection establishment
- Gateway temporarily unavailable and kernel retrying
- Subsystem deleted and recreated on the gateway

The kernel's ctrl_loss_tmo mechanism will continue
retry attempts for up to 30 minutes
( by -l param in nvme connect command).
Attempting nvme connect while a path is in
"connecting" state results in "already connected"
errors and can cause volume attachment failures
during create/delete cycles.

By treating "connecting" as a valid state,
we allow the kernel's retry logic to handle
reconnection automatically without interference.

Signed-off-by: gadi-didi <gadi.didi@ibm.com>
@gadididi gadididi force-pushed the nvmeof/fix_nodesever_already_connected branch from ad52f60 to 4dc16ff Compare January 26, 2026 13:41
path.Address.Trsvcid == gatewayPort &&
path.State == "live" {
(path.State == "live" ||
path.State == "connecting") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to retry if its in connecting state until it goes to live?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't need to retry. Here's why:
The kernel is already doing the retrying for us. When a path is in "connecting" state, the NVMe kernel driver is actively attempting to establish the connection and will keep retrying automatically for up to 30 minutes (based on the ctrl_loss_tmo we set with -l 1800 in nvme connect command).

From the CSI driver's perspective, our job is just to ensure a connection attempt has been initiated for the next step- mounting.
Once we see a path in either "live" or "connecting" state for our subsystem, we know the kernel has it under control.
What happens in practice:
this function is called from the nodeserver in NodeStageVolume(). right after ControllerPublishVolume() added (if was need to add) the host (by GRPC AddHost to nvmeof GW)

If the path is "live" --> great, connection is working
If the path is "connecting" --> it means there is a path but it sis not connected (probably due to remove last nvmeof ns , So the ControllerUnPublishVolume() removed the host and the subsystem) kernel is retrying, and when the ControllerPublishVolume() added the host again, it becomes available but it is too fast get into this check. the nvme will automatically transition to "live" and the namespace will appear.

another example.

  1. there was connection with live path for some nvmeof ns . then the admin deletes that pod and the pvc. the host and the subsystem both are deleted.
  2. 30 minutes passed
  3. admin wants to create new pvc and then new pod.
  4. the nvme subsys-list will not see connecting or live.
  5. nvme connect will rerun again

actually I could anyway to rerun connect, without this checking. if it is "live" or "connecting" I would get an error "Already Connected" So basically, I can parse the error code and stderr and see if this is the case, move on, and do not raise an error.
@Madhu-1 what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions

  • if its in the connecting state, will the application be able to read/write? I assume no. What is the use of mounting such PVC if it cannot be used?
  • Will there be any problem during NodeUnstage?

actually I could anyway to rerun connect, without this checking. if it is "live" or "connecting" I would get an error "Already Connected" So basically, I can parse the error code and stderr and see if this is the case, move on, and do not raise an error.

If the nvme subsys-list is not reliable, we should rerun the connect and discard known errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its in the connecting state, will the application be able to read/write? I assume no. What is the use of mounting such PVC if it cannot be used?

The "connecting" state we're checking happens right after ControllerPublishVolume stage, and before the mount stage.
Here's the timeline:

  1. CreateVolume - Volume is created on the gateway

  2. ControllerPublishVolume - Host is added to the subsystem on the gateway

  3. NodeStageVolume - This is where we check for existing connections.
    At this point, we call nvme list-subsys to see what's already connected
    If we see "connecting", it means a previous operation already initiated the connection (some prev PVC that's already deleted)

  4. The kernel continues establishing the connection in the background

  5. Before we actually mount, we wait for the namespace device to appear using GetNamespaceDeviceByUUID() which retries until the device is accessible

  6. Only when the device exists and is accessible do we proceed to format and mount it

So, the key point is: We don't mount a PVC that's stuck in "connecting" state. We just avoid running nvme connect again (which would fail with "already connected"). We still wait for the device to become available before mounting.

If the nvme subsys-list is not reliable, we should rerun the connect and discard known errors

I dont think it is not reliable.
at some runs the nvme subsys-list already shows live . I think it is about timing. but for sure (and easier way) to parse the stderr is fit solution.
But I would avoid parsing the string in stderr in some reason the next nvme-tool version will return another sting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this approach for now. If there is a need to always try to connect, we can do that at a later point in time too.

@Madhu-1 Madhu-1 added the ci/skip/e2e skip running e2e CI jobs label Jan 27, 2026
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Jan 27, 2026

@Mergifyio queue

@mergify
Copy link
Contributor

mergify bot commented Jan 27, 2026

queue

✅ The pull request has been merged automatically

Details

The pull request has been merged automatically at 8df46e3

@mergify
Copy link
Contributor

mergify bot commented Jan 27, 2026

Merge Queue Status

✅ The pull request has been merged at 4dc16ff

This pull request spent 35 minutes 10 seconds in the queue, including 34 minutes 52 seconds running CI.
The checks were run on draft #5981.

Required conditions to merge
  • #approved-reviews-by >= 2 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • any of:
    • all of:
      • base=devel
      • status-success=codespell
      • status-success=go-test
      • status-success=golangci-lint
      • status-success=lint-extras
      • status-success=mod-check
      • status-success=multi-arch-build
      • status-success=uncommitted-code-check
      • any of:
        • label=ci/skip/e2e
        • all of:
          • status-success=ci/centos/k8s-e2e-external-storage/1.32
          • status-success=ci/centos/k8s-e2e-external-storage/1.33
          • status-success=ci/centos/k8s-e2e-external-storage/1.34
          • status-success=ci/centos/mini-e2e-helm/k8s-1.32
          • status-success=ci/centos/mini-e2e-helm/k8s-1.33
          • status-success=ci/centos/mini-e2e-helm/k8s-1.34
          • status-success=ci/centos/mini-e2e/k8s-1.32
          • status-success=ci/centos/mini-e2e/k8s-1.33
          • status-success=ci/centos/mini-e2e/k8s-1.34
          • status-success=ci/centos/upgrade-tests-cephfs
          • status-success=ci/centos/upgrade-tests-rbd
    • all of:
      • base~=^(release-.+)$
      • status-success=codespell
      • status-success=go-test
      • status-success=golangci-lint
      • status-success=lint-extras
      • status-success=mod-check
      • status-success=multi-arch-build
      • status-success=uncommitted-code-check
      • any of:
        • label=ci/skip/e2e
        • all of:
          • status-success=ci/centos/k8s-e2e-external-storage/1.32
          • status-success=ci/centos/k8s-e2e-external-storage/1.33
          • status-success=ci/centos/k8s-e2e-external-storage/1.34
          • status-success=ci/centos/mini-e2e-helm/k8s-1.32
          • status-success=ci/centos/mini-e2e-helm/k8s-1.33
          • status-success=ci/centos/mini-e2e-helm/k8s-1.34
          • status-success=ci/centos/mini-e2e/k8s-1.32
          • status-success=ci/centos/mini-e2e/k8s-1.33
          • status-success=ci/centos/mini-e2e/k8s-1.34
          • status-success=ci/centos/upgrade-tests-cephfs
          • status-success=ci/centos/upgrade-tests-rbd
    • all of:
      • base=release-v3.15
      • status-success=codespell
      • status-success=go-test
      • status-success=golangci-lint
      • status-success=lint-extras
      • status-success=mod-check
      • status-success=multi-arch-build
      • status-success=uncommitted-code-check
      • any of:
        • label=ci/skip/e2e
        • all of:
          • status-success=ci/centos/k8s-e2e-external-storage/1.31
          • status-success=ci/centos/mini-e2e-helm/k8s-1.31
          • status-success=ci/centos/mini-e2e/k8s-1.31
          • status-success=ci/centos/k8s-e2e-external-storage/1.32
          • status-success=ci/centos/k8s-e2e-external-storage/1.33
          • status-success=ci/centos/mini-e2e-helm/k8s-1.32
          • status-success=ci/centos/mini-e2e-helm/k8s-1.33
          • status-success=ci/centos/mini-e2e/k8s-1.32
          • status-success=ci/centos/mini-e2e/k8s-1.33
          • status-success=ci/centos/upgrade-tests-cephfs
          • status-success=ci/centos/upgrade-tests-rbd
    • all of:
      • base=ci/centos
      • status-success=ci/centos/jjb-validate
      • status-success=ci/centos/job-validation

@mergify mergify bot added the queued label Jan 27, 2026
mergify bot added a commit that referenced this pull request Jan 27, 2026
@mergify mergify bot merged commit 8df46e3 into ceph:devel Jan 27, 2026
21 checks passed
@mergify mergify bot removed the queued label Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci/skip/e2e skip running e2e CI jobs component/nvme-of Issues and PRs related to NVMe-oF.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Already Connected in Nodeserver

3 participants