Skip to content

Conversation

@CrystalChun
Copy link
Contributor

@CrystalChun CrystalChun commented Jan 12, 2026

In kube-api there are instances where a user might manually start the assisted installer agent before the Ironic Agent completes all that it needs to do. This leads to the host installing incorrectly.

To prevent this, we'll block the installation from starting until we know that Ironic Agent has completed if it is enabled.
If it's not enabled, the install should progress as normal.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Manual testing

With converged flow enabled

Blocking install

  1. Create late-binding BMH and InfraEnv with Ironic agent override image set to an incompatible CPU architecture (arm64) for amd64 machine
    annotations:
      infraenv.agent-install.openshift.io/ironic-agent-image-override: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d99b3160fad7da23af2f9dcee8a9804d3f23858f22b09ad1e1606fff80e756d
  1. Observe the BMH stuck in inspecting, log into host and start the assisted-service agent manually
  2. Attach Agent to cluster
  3. Observe that the Agent does not start installing and the BMH is stuck in inspecting

Happy flow

  1. Create late-binding BMH and InfraEnv with correct ironic image
  2. Observe the BMH move to provisioning and ironic agent finishing inspecting and starting assisted agent
  3. When the agent appears, bind it to a cluster
  4. Observe that the Agent starts and finishes installing as usual

Regression test without BMH

  1. Create InfraEnv and boot host from discovery ISO
  2. Observe Agent appearing
  3. Bind Agent to cluster
  4. Observe the Agent start and finish installation

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 12, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 12, 2026

@CrystalChun: This pull request references MGMT-20398 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

In kube-api there are instances where a user might manually start the assisted installer agent before the Ironic Agent completes all that it needs to do. This leads to the host installing incorrectly.

To prevent this, we'll block the installation from starting until we know that Ironic Agent has completed if it is enabled.
If it's not enabled, the install should progress as normal.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Jan 12, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Excluded labels (none allowed) (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 12, 2026
@openshift-ci
Copy link

openshift-ci bot commented Jan 12, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@CrystalChun
Copy link
Contributor Author

/test ?

@openshift-ci openshift-ci bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 12, 2026
@openshift-ci
Copy link

openshift-ci bot commented Jan 12, 2026

@CrystalChun: The following commands are available to trigger required jobs:

/test e2e-agent-compact-ipv4
/test edge-assisted-operator-catalog-publish-verify
/test edge-ci-index
/test edge-e2e-ai-operator-disconnected-capi
/test edge-e2e-ai-operator-ztp
/test edge-e2e-ai-operator-ztp-3masters
/test edge-e2e-ai-operator-ztp-capi
/test edge-e2e-ai-operator-ztp-disconnected
/test edge-e2e-metal-assisted-4-16
/test edge-e2e-metal-assisted-4-17
/test edge-e2e-metal-assisted-4-20
/test edge-e2e-metal-assisted-5-control-planes-4-20
/test edge-e2e-metal-assisted-external-4-20
/test edge-e2e-metal-assisted-lvm-4-20
/test edge-e2e-metal-assisted-none-4-20
/test edge-e2e-metal-assisted-openshift-ai-4-18
/test edge-e2e-metal-assisted-osc-4-20
/test edge-e2e-metal-assisted-virtualization-4-19
/test edge-e2e-metal-assisted-vlan-4-20
/test edge-e2e-vsphere-assisted-4-20
/test edge-images
/test edge-lint
/test edge-operator-publish-verify
/test edge-subsystem-aws
/test edge-subsystem-kubeapi-aws
/test edge-unit-test
/test edge-verify-generated-code
/test images
/test mce-images
/test okd-scos-images
/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-agent-4control-ipv4
/test e2e-agent-5control-ipv4
/test e2e-agent-ha-dualstack
/test e2e-agent-sno-ipv6
/test edge-e2e-ai-operator-ztp-4masters
/test edge-e2e-ai-operator-ztp-5masters
/test edge-e2e-ai-operator-ztp-compact-day2-masters
/test edge-e2e-ai-operator-ztp-compact-day2-workers
/test edge-e2e-ai-operator-ztp-multiarch-3masters-ocp
/test edge-e2e-ai-operator-ztp-multiarch-sno-ocp
/test edge-e2e-ai-operator-ztp-node-labels
/test edge-e2e-ai-operator-ztp-remove-node
/test edge-e2e-ai-operator-ztp-sno-day2-masters
/test edge-e2e-ai-operator-ztp-sno-day2-workers
/test edge-e2e-ai-operator-ztp-sno-day2-workers-ignitionoverride
/test edge-e2e-ai-operator-ztp-sno-day2-workers-late-binding
/test edge-e2e-metal-assisted-4-control-planes-4-20
/test edge-e2e-metal-assisted-4-masters-none-4-20
/test edge-e2e-metal-assisted-bond-4-20
/test edge-e2e-metal-assisted-day2-4-20
/test edge-e2e-metal-assisted-day2-arm-workers-4-20
/test edge-e2e-metal-assisted-day2-sno-4-20
/test edge-e2e-metal-assisted-dual-primary-v6-compact-4-20
/test edge-e2e-metal-assisted-dual-stack-primary-ipv6-4-20
/test edge-e2e-metal-assisted-ha-kube-api-ipv4-4-20
/test edge-e2e-metal-assisted-ha-kube-api-ipv6-4-20
/test edge-e2e-metal-assisted-ipv4v6-4-20
/test edge-e2e-metal-assisted-ipv6-4-20
/test edge-e2e-metal-assisted-kube-api-late-binding-sno-4-20
/test edge-e2e-metal-assisted-kube-api-late-unbinding-sno-4-20
/test edge-e2e-metal-assisted-kube-api-umlb-4-20
/test edge-e2e-metal-assisted-onprem-4-20
/test edge-e2e-metal-assisted-osc-sno-4-20
/test edge-e2e-metal-assisted-sno-4-20
/test edge-e2e-metal-assisted-static-ip-suite-4-20
/test edge-e2e-metal-assisted-tang-4-20
/test edge-e2e-metal-assisted-tpmv2-4-20
/test edge-e2e-metal-assisted-umlb-4-20
/test edge-e2e-metal-assisted-upgrade-agent-4-20
/test edge-e2e-oci-assisted-4-20
/test edge-e2e-oci-assisted-bm-iscsi-4-20
/test edge-e2e-vsphere-assisted-umlb-4-20
/test edge-e2e-vsphere-assisted-umn-4-20
/test okd-scos-e2e-aws-ovn
/test push-pr-image

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-assisted-service-master-e2e-agent-compact-ipv4
pull-ci-openshift-assisted-service-master-edge-ci-index
pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-disconnected-capi
pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-ztp
pull-ci-openshift-assisted-service-master-edge-e2e-ai-operator-ztp-capi
pull-ci-openshift-assisted-service-master-edge-e2e-metal-assisted-4-20
pull-ci-openshift-assisted-service-master-edge-images
pull-ci-openshift-assisted-service-master-edge-lint
pull-ci-openshift-assisted-service-master-edge-subsystem-aws
pull-ci-openshift-assisted-service-master-edge-subsystem-kubeapi-aws
pull-ci-openshift-assisted-service-master-edge-unit-test
pull-ci-openshift-assisted-service-master-edge-verify-generated-code
pull-ci-openshift-assisted-service-master-images
pull-ci-openshift-assisted-service-master-mce-images
pull-ci-openshift-assisted-service-master-okd-scos-images
pull-ci-openshift-assisted-service-master-verify-deps
Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@CrystalChun
Copy link
Contributor Author

/test edge-subsystem-kubeapi-aws edge-subsystem-aws

@openshift-ci openshift-ci bot added the api-review Categorizes an issue or PR as actively needing an API review. label Jan 12, 2026
@openshift-ci
Copy link

openshift-ci bot commented Jan 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CrystalChun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 12, 2026
@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 12, 2026
@CrystalChun
Copy link
Contributor Author

/test edge-subsystem-kubeapi-aws edge-subsystem-aws

@CrystalChun
Copy link
Contributor Author

/test edge-subsystem-kubeapi-aws edge-subsystem-aws

@CrystalChun CrystalChun force-pushed the split-brain branch 3 times, most recently from a73f408 to a660c71 Compare January 12, 2026 22:11
@CrystalChun
Copy link
Contributor Author

/test edge-subsystem-kubeapi-aws edge-subsystem-aws

@CrystalChun
Copy link
Contributor Author

/test edge-subsystem-kubeapi-aws edge-subsystem-aws

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 12, 2026

@CrystalChun: This pull request references MGMT-20398 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

In kube-api there are instances where a user might manually start the assisted installer agent before the Ironic Agent completes all that it needs to do. This leads to the host installing incorrectly.

To prevent this, we'll block the installation from starting until we know that Ironic Agent has completed if it is enabled.
If it's not enabled, the install should progress as normal.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Manual testing

With converged flow enabled

Blocking install

  1. Create late-binding BMH and InfraEnv with Ironic agent override image set to an incompatible CPU architecture (arm64) for amd64 machine
   annotations:
     infraenv.agent-install.openshift.io/ironic-agent-image-override: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d99b3160fad7da23af2f9dcee8a9804d3f23858f22b09ad1e1606fff80e756d
  1. Observe the BMH stuck in inspecting, log into host and start the assisted-service agent manually
  2. Attach Agent to cluster
  3. Observe that the Agent does not start installing and the BMH is stuck in inspecting

Happy flow

  1. Create late-binding BMH and InfraEnv with correct ironic image
  2. Observe the BMH move to provisioning and ironic agent finishing inspecting and starting assisted agent
  3. When the agent appears, bind it to a cluster
  4. Observe that the Agent starts and finishes installing as usual

Regression test without BMH

  1. Create InfraEnv and boot host from discovery ISO
  2. Observe Agent appearing
  3. Bind Agent to cluster
  4. Observe the Agent start and finish installation

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@CrystalChun CrystalChun marked this pull request as ready for review January 13, 2026 02:21
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 13, 2026
@CrystalChun
Copy link
Contributor Author

/cc @carbonin

@openshift-ci openshift-ci bot requested review from carbonin and danielerez January 13, 2026 02:21
@openshift-ci openshift-ci bot requested a review from eranco74 January 13, 2026 02:22
@codecov
Copy link

codecov bot commented Jan 13, 2026

Codecov Report

❌ Patch coverage is 15.87302% with 106 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.52%. Comparing base (13a784a) to head (2faea21).

⚠️ Current head 2faea21 differs from pull request most recent head 8258b3a

Please upload reports for the commit 8258b3a to get more accurate results.

Files with missing lines Patch % Lines
restapi/embedded_spec.go 0.00% 40 Missing ⚠️
...nternal/controller/controllers/agent_controller.go 23.52% 35 Missing and 4 partials ⚠️
internal/host/host.go 0.00% 15 Missing ⚠️
internal/bminventory/inventory.go 44.44% 8 Missing and 2 partials ⚠️
internal/host/hostcommands/install_cmd.go 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8743      +/-   ##
==========================================
- Coverage   43.82%   43.52%   -0.31%     
==========================================
  Files         414      412       -2     
  Lines       71747    71649      -98     
==========================================
- Hits        31443    31185     -258     
- Misses      37447    37660     +213     
+ Partials     2857     2804      -53     
Files with missing lines Coverage Δ
internal/host/hostcommands/install_cmd.go 83.33% <0.00%> (-0.50%) ⬇️
internal/bminventory/inventory.go 71.88% <44.44%> (-0.11%) ⬇️
internal/host/host.go 72.39% <0.00%> (-1.19%) ⬇️
...nternal/controller/controllers/agent_controller.go 75.74% <23.52%> (-0.85%) ⬇️
restapi/embedded_spec.go 0.00% <0.00%> (ø)

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@CrystalChun
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link

openshift-ci bot commented Jan 13, 2026

@CrystalChun: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@CrystalChun CrystalChun marked this pull request as draft January 14, 2026 22:08
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 14, 2026
Allows tracking the status of the ironic agent
through the host model.
If a host requests to install, we'll first check if Ironic is either
complete or not required. If it is required and still in progress,
then we'll error out before sending that command to the host.
If the converged flow is enabled, then we set the status
to `in_progress` if the BMH is not provisioning or
provisioned or to `completed` otherwise.
If the converged flow is not enabled, then the status
is set to `not_required`.
Reducing the cyclomatic complexity of updateIfNeeded function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants