Skip to content

Conversation

@ChayanDass
Copy link
Contributor

Description

This PR adds proper lifecycle cleanup for Kubernetes cache watchers when kube contexts are removed or reloaded. Previously, cache watchers could continue running even after their associated kube context was deleted, leading to stale watchers and potential goroutine leaks.

A new StopWatcher mechanism is introduced and wired into kubeconfig context synchronization to ensure watchers are cancelled and cleaned up correctly.

What Changed

  • Added StopWatcher(contextKey) to cancel and clean up cache watchers
  • Wired kubeconfig context removal and reload to stop active watchers
  • Ensured watcher cleanup is safe and idempotent

Related Issue

Fixes the review feedback on backend cache invalidation tests from PR

Fixes

@k8s-ci-robot k8s-ci-robot requested review from illume and sniok February 2, 2026 13:02
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ChayanDass
Once this PR has been reviewed and has the lgtm label, please assign illume for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 2, 2026
@ChayanDass
Copy link
Contributor Author

@illume, I need a suggestion. Currently, StopWatcher(contextKey) is called during syncContexts,
The callback is registered in server.go (cache middleware) using dependency injection to avoid an import cycle.
Is this sufficient, or would you prefer StopWatcher to be called from additional functions?

@illume
Copy link
Contributor

illume commented Feb 2, 2026

@illume, I need a suggestion. Currently, StopWatcher(contextKey) is called during syncContexts,
The callback is registered in server.go (cache middleware) using dependency injection to avoid an import cycle.
Is this sufficient, or would you prefer StopWatcher to be called from additional functions?

How frequently will this be called? Probably we only need it called if it's using a lot of data, or in hours or days. Is it being called on every request? Think about if it's going to use up a lot of resources just checking if stuff should be freed.

(This sounds sufficient, but I haven't looked closely yet and it's been quite a while since I last looked at that code)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR wires Kubernetes kubeconfig context lifecycle events into the k8s cache watcher lifecycle so that when kube contexts are removed, their corresponding cache watchers are also stopped and cleaned up. It introduces a StopWatcher(contextKey) in the cache layer and a StopContextWatcher callback in the kubeconfig watcher to prevent stale watchers and goroutine leaks.

Changes:

  • Added a StopWatcher(contextKey string) function in backend/pkg/k8cache/cacheInvalidation.go to cancel watcher contexts and remove their entries from watcherRegistry and contextCancel.
  • Introduced a package-level StopContextWatcher callback in backend/pkg/kubeconfig/watcher.go and invoked it when kubeconfig contexts are removed in syncContexts.
  • Connected the two by assigning kubeconfig.StopContextWatcher = k8cache.StopWatcher in CacheMiddleWare, so that kubeconfig watcher cleanup is triggered when contexts disappear.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
backend/pkg/kubeconfig/watcher.go Adds the StopContextWatcher global callback and invokes it before removing contexts from the kubeconfig store.
backend/pkg/k8cache/cacheInvalidation.go Adds StopWatcher to stop watcher goroutines and clean up internal watcher tracking maps.
backend/cmd/server.go Hooks kubeconfig.StopContextWatcher to k8cache.StopWatcher inside CacheMiddleWare to integrate kubeconfig context removal with cache watcher shutdown.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 164 to 172
// StopWatcher stops and cleans up the watcher for a given contextKey.
func StopWatcher(contextKey string) {
if cancelAny, ok := contextCancel.Load(contextKey); ok {
cancelAny.(context.CancelFunc)()
contextCancel.Delete(contextKey)
watcherRegistry.Delete(contextKey)
}
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new StopWatcher function lacks unit tests even though cacheInvalidation.go already has tests for related behavior (e.g. RunInformerToWatch in cacheInvalidation_test.go). Given that this function is responsible for cancelling contexts and cleaning up watcherRegistry/contextCancel, adding tests to verify that it correctly removes entries, is safe when called for unknown contextKeys, and is idempotent would help ensure the watcher lifecycle behavior stays correct.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants