Skip to content

Conversation

@caffeinated92
Copy link
Collaborator

This pull request refactors and improves the handling of restart cookies for containers within the cluster management codebase. The main change is a systematic renaming and specialization of the "restart cookie" logic to "restart container cookie," along with added safety checks to prevent container restarts while jobs are still running. The update also includes corresponding changes to tests and supporting utility methods.

Restart container cookie handling improvements:

  • Renamed all methods and logic from generic "restart cookie" to "restart container cookie" throughout the codebase, including SetRestartContainerCookie, HasRestartContainerCookie, and DelRestartContainerCookie in ServerMonitor and their usages in cluster management and test files. [1] [2] [3] [4] [5] [6]
  • Updated comments and log messages to refer to "restart container cookie" for clarity and consistency. [1] [2] [3] [4]

Safety enhancements for container restarts:

  • Added a check in CheckRestartContainerCookies to defer restart if jobs are still running on the server, using the new HasRunningDBJobs method, and logging a warning if so. [1] [2] [3]

Test updates:

Error and logging improvements:

  • Added a new warning code (WARN0165) for cases when a restart container cookie is deferred due to running jobs, improving observability.

General code maintenance:

  • Updated the cluster run loop and supporting files to use the new container cookie methods, removing the old generic restart cookie logic. [1] [2]

@claude
Copy link

claude bot commented Jan 21, 2026

Code Review: Restart Container Cookie Handling Improvements

Summary

This PR implements a systematic refactoring of restart cookie logic to specifically target container operations, including safety enhancements to prevent restarts during active database jobs. The changes are well-structured and comprehensive.


✅ Strengths

  1. Consistent Refactoring

    • Method names renamed throughout codebase
    • Comments and logging updated for clarity
    • Test coverage updated consistently
    • All 14 files modified maintain naming consistency
  2. Safety Enhancement

    • New HasRunningDBJobs() method properly checks for active database jobs
    • Safety check defers restart if jobs are running
    • Server must be online before job check
    • Uses new error constant WARN0165 for logging
  3. Robust Implementation

    • HasRunningDBJobs() handles missing table gracefully
    • Connection validation before querying
    • Proper error handling with continued execution on failure
    • Uses database timeout constant
  4. Test Coverage

    • All test names updated consistently
    • Tests properly renamed (10+ test references updated)
    • Benchmarks included
    • Documentation files updated

⚠️ Code Quality Observations

  1. Database Query

    • Properly parameterized (no SQL injection risk)
    • Correct state filtering
    • Uses timeout constant
    • Handles missing table gracefully
  2. Error Handling

    • Connection nil checks in place
    • Query timeout applied
    • Table missing handled gracefully
    • Checker continues on error without crashing
  3. Concurrency

    • RestartNode/RestartRid have single writer assumption
    • Job check only reads, doesn't modify state
    • Safe for concurrent access

🎯 Design

  1. Cookie File Location - Stored as cookie_restart_container prevents conflicts if both old and new code run (backward compatible)

  2. Job Check Scope - Only triggers for container#jobs rid, appropriately separated from other restart scenarios

  3. Deferred Restart - When jobs running, restart deferred. Cookie remains for automatic retry next cycle

  4. Parameter Validation - Only container#jobs allowed, validated at both API layer and database layer


✔️ Verification

  • All method names renamed consistently across 14 files
  • New HasRunningDBJobs() method implemented
  • Safety check properly integrated in checker
  • Error handling covers all paths
  • Test names updated consistently
  • New warning code added to error list
  • Comments updated throughout
  • No SQL injection vulnerabilities
  • Null pointer checks in place

🚀 Recommendation

✅ APPROVED

This PR successfully achieves its goals:

  1. Specializes cookie logic for container operations (improves clarity)
  2. Prevents unsafe restarts during active jobs (improves safety)
  3. Maintains backward compatibility (new cookie file name)
  4. Includes comprehensive test coverage

The implementation is solid, well-tested, and production-ready.

@caffeinated92 caffeinated92 merged commit 32cc9be into develop Jan 21, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants