Is there an existing feature request for this?
Problem or Motivation
Due to GPU shortage, users could bring multiple clusters in different regions. When deploying a model for serving or AI workload, schedule the workload to one of the provided clusters depending on required and available capacity.
Proposed Solution
Enable multicluster fleet to manage multiple clusters and select the appropriate cluster based on required and available capacity.
Alternatives Considered
No response
Feature Area
Deployments / Model Management
How important is this feature to you?
Nice to have
Mockups or Examples
No response
Additional Context
No response