Background
Open Cluster Management (OCM) is a community-driven project that is focused on multicluster and multicloud scenarios for Kubernetes applications.
In a multicluster environment, users such as administrators usually need to do some configuration on target clusters. In other situations, application developers may want to deploy the workload to specific clusters. The workload might be a Kubernetes Service, Deployment, ConfigMap or a bundle of different Kubernetes objects. The users would have some requirement on the target clusters, which might include the following examples:
- I want to only configure the clusters on Amazon Web Services (AWS).
- I want to only deploy this workload to clusters that have the label
group=dev
. - I want the workload always running on the 3 clusters with the most allocatable memory.
To select the target clusters, you can choose to hardcode the target cluster names in the deploy pipelines, or use some form of label selectors. For workloads that have requirements on resources, you need a fine-grained scheduler to dispatch workload to clusters with sufficient resources. The schedule decision should always dynamically update when the cluster attributes change.
In OCM, the previously described scheduling features are achieved by component placement. In this blog, I will explain how placement selects desired clusters, what scheduling capabilities the placement can provide now, and some best practices that you can use to write a placement to suit your requirement. Some advanced scheduling features like supporting taints and tolerations, as well as topological selection (spread) are active discussions in the OCM community.
The placement features are also delivered as new Technology preview features in Red Hat Advanced Cluster Management version 2.4.
Note: The following links provide information about concepts to understand before continuing with this blog:
What is placement?
The Placement
API is used to select a set of managed clusters in one or multiple ManagedClusterSets
so that the workloads can be deployed to these clusters.
If you define a valid Placement
, the placement controller generates a corresponding PlacementDecision
with the selected clusters listed in the status. As an end user, you can parse the selected clusters and then operate on the target clusters. You can also integrate a high-level workload orchestrator with the placement decision to leverage its scheduling capabilities.
For example, Argo has an integration with Placement. You can specify a ConfigMap
, which is associated with PlacementDecision
in the clusterDecisionResource
of ApplicationSet
, so Argo can use the scheduling decision of the Placement to automatically assign the application to a set of target clusters. For example:
yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: book-import
spec:
generators:
- clusterDecisionResource:
configMapRef: ocm-placement
labelSelector:
matchLabels:
cluster.open-cluster-management.io/placement: local-cluster
requeueAfterSeconds: 30
template:
apiVersion: v1
kind: ConfigMap
metadata:
name: ocm-placement
data:
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: placementdecisions
statusListKey: decisions
matchKey: clusterName
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: PlacementDecision
metadata:
labels:
cluster.open-cluster-management.io/placement: local-cluster
name: local-cluster-decision-1
status:
decisions:
- clusterName: cluster1
reason: ""
- clusterName: cluster2
reason: ""
KubeVela, as an implementation of the open application model, can also use the Placement
API for workload scheduling.
How does placement select clusters?
Let’s take a deeper look at the Placement API to see how it selects the desired cluster and what scheduling ability it can provide.
The following content is an example of a Placement
:
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: Placement
metadata:
name: placement
namespace: ns1
spec:
numberOfClusters: 4
clusterSets:
- clusterset1
- clusterset2
predicates:
- requiredClusterSelector:
labelSelector:
matchLabels:
vendor: OpenShift
prioritizerPolicy:
mode: Exact
configurations:
- scoreCoordinate:
builtIn: ResourceAllocatableMemor
- scoreCoordinate:
builtIn: Steady
weight: 3
- scoreCoordinate:
type: AddOn
addOn:
resourceName: default
scoreName: cpuratio
The spec
contains the following optional four sections:
numberOfClusters
represents the desired number ofManagedClusters
to be selected, which meet the placement requirements.clusterSets
represents theManagedClusterSets
from which theManagedClusters
are selected.predicates
represents a slice of predicates to selectManagedClusters
with label and claim selector. The predicates are ORed.prioritizerPolicy
represents the policy of prioritizers. Themode
value is used to set whether or not to use the default prioritizers. Specific prioritizers can be configured inconfigurations
. Currently, the default built-in prioritizer includesBalance
,Steady
,ResourceAllocatableCPU
, andResourceAllocatableMemory
. Placement also supports the selection of clusters based on scores provided by third parties defined inaddOne
. Theweight
is an integer from-10
to10
that adjusts the effect of different prioritizer scores on the total score.
A default value is used if the values in a section are not defined. The details of the values in each field are defined in PlacementSpec If the spec
is empty, all ManagedClusters
from the ManagedClusterSets
bound to the placement namespace are selected as possible choices.
If the spec
is empty, all ManagedClusters
from the ManagedClusterSets
bound to the placement namespace are selected as possible choices.
The definition of each section plays a role in the scheduling. A typical scheduling process is shown in the following example:
-
The scheduling framework identifies available
ManagedClusters
from theManagedClusterSets
that are defined in theclusterSets
. -
The scheduling filter plugin selects
ManagedClusters
by using the label and claim selector that are defined inpredicates
. -
The scheduling prioritizer plugins that are enabled in
prioritizerPolicy
assign a score to each filteredManagedCluster
and prioritize them by the total score from high to low. -
Select the top
k
clusters and list them inPlacementDecision
. The value ofk
is the number of clusters that are defined innumberOfClusters
.
If applied to the previous example, the scheduling process would be similar to the following example:
-
The scheduling framework identifies
clusterset1
andclusterset2
as the availableManagedClusters
. -
The scheduling filter plugin filters the clusters with label
vendor=OpenShift
. -
The scheduling prioritizer plugins named
ResourceAllocatableMemory
andSteady
assign a score to each filteredManagedCluster
. WhenaddOn
is defined, the placement tries to get the cluster scorecpuratio
from third-party resources. The total score of a cluster is calculated by the following formula:1 (the default weight of the
ResourceAllocatableMemory
prioritizer because no weight is specified) * prioritizer_ResourceAllocatableMemory_score + 3 (the weight value specified for theSteady
prioritizer) * prioritizer_Steady_score + 1 (the default weight of theaddOn
) * cpuratio (theaddOn
scorecpuratio
) -
The framework prioritizes them by the total score from high to low, and returns the four clusters with the highest scores.
In the score and prioritize step, it is actually a combination of multiple prioritizers. The algorithm of each prioritizer and the weight of each prioritizer impacts the final decision. In the next section, let’s take a deeper look at prioritizers, so that you can have a better understanding about how the placement selects the clusters.
How do placement prioritizers work?
At the time this blog was written, there were four available prioritizers:
-
ResourceAllocatableCPU and ResourceAllocatableMemory: Makes the scheduling decisions based on the resource allocatable CPU or memory of managed clusters. The clusters that have the most allocatable resources are given the highest score (100), while the clusters with the least allocatable resources are given the lowest score (-100).
The prioritizeraddOn
also supports selecting clusters based on customized scores. You can enable this selection by providing a new APIAddOnPlacementScore
to support a more extensible way to schedule. - As a
user
, you can specify the score in the placement yaml content to select clusters. - As a
score provider
, a 3rd party controller can run on either the hub or a managed cluster to maintain the lifecycle ofAddOnPlacementScore
and update the score into it.
See enhancements to learn more.
When making cluster decisions, managed clusters are sorted by the final score of each managed cluster, which is the sum of scores from all prioritizers with weights: final_score = sum(prioritizer_x_weight * prioritizer_x_score)
, while prioritizer_x_weight
is the weight of prioritizer x, prioritizer_x_score
is the score returned by prioritizer x for a managed cluster.
You can adjust the weights of prioritizers to impact the final score, for example:
- Set the weight of resource prioritizers to schedule placement based on resource allocatable;
- Make the placement sensitive to resource usage by setting a higher weight for resource prioritizers;
- Ignore resource usage change and pin the placement decisions by increasing the weight of the steady prioritizer;
Here are some practical examples to illustrate how multiple prioritizers work together to make the final placement decision. These examples can also be treated as some best practices for the specific use cases.
Assumptions:
- There are 3 managed clusters that are bound to the example namespace:
ns1
:cluster1
has 60 MB of allocatable memorycluster2
has 80 MB of allocatable memorycluster3
has 100 MB of allocatable memory
Case 1: Selecting clusters with the largest allocatable memory
In this example, you want to select clusters with the largest allocatable memory. To prioritize clusters by allocatable memory, you can configure ResourceAllocatableMemory
in prioritizerPolicy to enable it.
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: Placement
metadata:
name: demo
namespace: ns1
spec:
numberOfClusters: 2
prioritizerPolicy:
configurations:
- scoreCoordinate:
builtIn: ResourceAllocatableMemory
When this placement is created, you can describe the Placement
, check the events to understand how clusters are selected by the prioritizers.
# oc describe placement demo -n ns1
Name: demo
Namespace: ns1
Labels: <none>
Annotations: <none>
API Version: cluster.open-cluster-management.io/v1alpha1
Kind: Placement
…
Status:
Conditions:
Last Transition Time: 2021-11-09T07:02:14Z
Message: All cluster decisions scheduled
Reason: AllDecisionsScheduled
Status: True
Type: PlacementSatisfied
Number Of Selected Clusters: 2
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DecisionCreate 10s placementController Decision demo-decision-1 is created with placement demo in namespace ns1
Normal DecisionUpdate 10s placementController Decision demo-decision-1 is updated with placement demo in namespace ns1
Normal ScoreUpdate 10s placementController cluster1:0 cluster2:100 cluster3:200
In this example, the Balance
and Steady
prioritizers are enabled by default with weight value of 1
in Additive
mode. ResourceAllocatableMemory
is also enabled to make the final decision. The score of a cluster is determined by the following formula:
1 * prioritizer_balance_score + 1 * prioritizer_steady_score + 1 * prioritizer_resourceallocatablememory_score
From the event, the total score of cluster1
is 0, cluster2
is 100 and cluster3
is 200. In this case, cluster2
and cluster3
should be selected.
Describe the PlacementDecision
to verify the guess.
# oc describe placementdecision demo-decision-1 -n ns1
Name: demo-decision-1
Namespace: ns1
Labels: cluster.open-cluster-management.io/placement=placement-jkd42
Annotations: <none>
API Version: cluster.open-cluster-management.io/v1alpha1
Kind: PlacementDecision
...
Status:
Decisions:
Cluster Name: cluster2
Reason:
Cluster Name: cluster3
Reason:
Events: <none>
In the PlacementDecision
status, cluster2
and cluster3
are listed in the decisions.
Let's try to add a new cluster with allocatable memory a little higher than the selected clusters.
The placement controller watches the managed clusters. When there is a resource change, it starts a reschedule. Now, let's add a new cluster, cluster4
with the allocatable memory of 100 MB, and check the placement event.
# oc describe placement demo -n ns1
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DecisionCreate 100s placementController Decision demo-decision-1 is created with placement demo in namespace ns1
Normal DecisionUpdate 100s placementController Decision demo-decision-1 is updated with placement demo in namespace ns1
Normal ScoreUpdate 100s placementController cluster1:0 cluster2:100 cluster3:200
There's no event update and no placement decision updated. So when adding a new cluster with the allocatable memory of 100 MB, which is a little higher than the allocated 80 MB for cluster2
, there's no impact on the placement decision.
Let's try to add a new cluster with allocatable memory much higher than the selected clusters.
Now let's try to add a new cluster cluster4
with allocatable memory 150 MB, and check the placement event again.
# oc describe placement demo -n ns1
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DecisionCreate 2m10s placementController Decision demo-decision-1 is created with placement demo in namespace ns1
Normal DecisionUpdate 2m10s placementController Decision demo-decision-1 is updated with placement demo in namespace ns1
Normal ScoreUpdate 2m10s placementController cluster1:0 cluster2:100 cluster3:200
Normal DecisionUpdate 3s placementController Decision demo-decision-1 is updated with placement demo in namespace ns1
Normal ScoreUpdate 3s placementController cluster1:200 cluster2:145 cluster3:189 cluster4:200
This time, the decision is updated with the change and the placement is rescheduled to cluster3
and cluster4
.
# oc describe placementdecision demo-decision-1 -n ns1
...
Status:
Decisions:
Cluster Name: cluster3
Reason:
Cluster Name: cluster4
Reason:
In the previous example, when the resource changes a little, there's no update in PlacementDecision
. And when resources change a lot, the changes are reflected in PlacementDecision
immediately. This leads to 2 challenges:
- How can I make my
PlacementDecision
sensitive to resource changes? - How do I make my
PlacementDecision
steady even if the cluster resource changes a lot?
Remember in prioritizerPolicy
we have 4 prioritizers and can adjust the weight of them. Let's solve these two problems by changing the prioritizerPolicy
.
Case 2: Selecting clusters with the largest allocatable memory and make placement sensitive to resource changes
To make decisions sensitive to resource changes, this time we explicitly set prioritizer ResourceAllocatableMemory
with the weight value of 3
.
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: Placement
metadata:
name: placement7
namespace: ns1
spec:
numberOfClusters: 2
prioritizerPolicy:
configurations:
- scoreCoordinate:
builtIn: ResourceAllocatableMemory
weight: 3
When this placement is created, let's describe the Placement
and check the PlacementDecision
.
# oc describe placement demo -n ns1
...
Status:
Conditions:
Last Transition Time: 2021-11-09T08:58:40Z
Message: All cluster decisions scheduled
Reason: AllDecisionsScheduled
Status: True
Type: PlacementSatisfied
Number Of Selected Clusters: 2
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DecisionCreate 35s placementController Decision demo-decision-1 is created with placement demo in namespace ns1
Normal DecisionUpdate 35s placementController Decision demo-decision-1 is updated with placement demo in namespace ns1
Normal ScoreUpdate 35s placementController cluster1:-200 cluster2:100 cluster3:400
# oc describe placementdecision demo-decision-1 -n ns1
...
Status:
Decisions:
Cluster Name: cluster2
Reason:
Cluster Name: cluster3
Reason:
The initial placement decision is cluster2
and cluster3
.
Now, let's add a new cluster cluster4
with allocatable memory of 100 MB again, and check the placement event.
# oc describe placement demo -n ns1
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DecisionCreate 3m1s placementController Decision demo-decision-1 is created with placement demo in namespace ns1
Normal DecisionUpdate 3m1s placementController Decision demo-decision-1 is updated with placement demo in namespace ns1
Normal ScoreUpdate 3m1s placementController cluster1:-200 cluster2:100 cluster3:400
Normal DecisionUpdate 2s placementController Decision demo-decision-1 is updated with placement demo in namespace ns1
Normal ScoreUpdate 2s placementController cluster1:-200 cluster2:200 cluster3:500 cluster4:400
This time, the PlacementDecision
updated. The placement rescheduled to cluster3
and cluster4
.
# oc describe placementdecision demo-decision-1 -n ns1
...
Status:
Decisions:
Cluster Name: cluster3
Reason:
Cluster Name: cluster4
Reason:
Case 3: Selecting clusters with the largest allocatable memory and pinning the placement decisions
To make decisions steady, this time we explicitly set prioritizer Steady
with a weight value of 3
.
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: Placement
metadata:
name: demo
namespace: ns1
spec:
numberOfClusters: 2
prioritizerPolicy:
configurations:
- scoreCoordinate:
builtIn: ResourceAllocatableMemory
- scoreCoordinate:
builtIn: Steady
weight: 3
When this placement is created, let's describe the Placement
and check the PlacementDecision
.
# oc describe placement demo -n ns1
...
Status:
Conditions:
Last Transition Time: 2021-11-09T09:05:36Z
Message: All cluster decisions scheduled
Reason: AllDecisionsScheduled
Status: True
Type: PlacementSatisfied
Number Of Selected Clusters: 2
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DecisionCreate 15s placementController Decision demo-decision-1 is created with placement demo in namespace ns1
Normal DecisionUpdate 15s placementController Decision demo-decision-1 is updated with placement demo in namespace ns1
Normal ScoreUpdate 15s placementController cluster1:0 cluster2:100 cluster3:200
# oc describe placementdecision demo-decision-1 -n ns1
...
Status:
Decisions:
Cluster Name: cluster2
Reason:
Cluster Name: cluster3
Reason:
The initial placement decision is cluster2
and cluster3
.
Now, let's add a new cluster with the allocatable memory of 150 MB again, and check the placement event. This time there's no event update, which means there are no changes in the PlacementDecision
.
# oc describe placement demo -n ns1
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal DecisionCreate 80s placementController Decision demo-decision-1 is created with placement demo in namespace ns1
Normal DecisionUpdate 80s placementController Decision demo-decision-1 is updated with placement demo in namespace ns1
Normal ScoreUpdate 80s placementController cluster1:0 cluster2:100 cluster3:200
Double check the PlacementDecision
. The decision is unchanged and pinned to cluster2
and cluster3
.
# oc describe placementdecision demo-decision-1 -n ns1
...
Status:
Decisions:
Cluster Name: cluster2
Reason:
Cluster Name: cluster3
Reason:
In the previous three examples, we showed how multiple prioritizers work together and how to influence the final decision by adjusting the weight of each prioritizer. You can try adjusting the weight or changing the enabled prioritizers for your own needs.
Summary
You can see how you can use the placement API in different situations. We explained what placement is and how it works with a popular open source project. We examined how placement selects clusters and how multiple placement prioritizers work together to make the decision by using some real examples. At the end of this post, we also give some best-practice placement examples for specific user requirements. Feel free to raise your question in the Open-cluster-management-io GitHub community or contact us using Slack.
Reference
A Model for Multicluster Workloads (in Kubernetes and Beyond)
Sobre os autores
Mais como este
Navegue por canal
Automação
Últimas novidades em automação de TI para empresas de tecnologia, equipes e ambientes
Inteligência artificial
Descubra as atualizações nas plataformas que proporcionam aos clientes executar suas cargas de trabalho de IA em qualquer ambiente
Nuvem híbrida aberta
Veja como construímos um futuro mais flexível com a nuvem híbrida
Segurança
Veja as últimas novidades sobre como reduzimos riscos em ambientes e tecnologias
Edge computing
Saiba quais são as atualizações nas plataformas que simplificam as operações na borda
Infraestrutura
Saiba o que há de mais recente na plataforma Linux empresarial líder mundial
Aplicações
Conheça nossas soluções desenvolvidas para ajudar você a superar os desafios mais complexos de aplicações
Programas originais
Veja as histórias divertidas de criadores e líderes em tecnologia empresarial
Produtos
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Red Hat Cloud Services
- Veja todos os produtos
Ferramentas
- Treinamento e certificação
- Minha conta
- Suporte ao cliente
- Recursos para desenvolvedores
- Encontre um parceiro
- Red Hat Ecosystem Catalog
- Calculadora de valor Red Hat
- Documentação
Experimente, compre, venda
Comunicação
- Contate o setor de vendas
- Fale com o Atendimento ao Cliente
- Contate o setor de treinamento
- Redes sociais
Sobre a Red Hat
A Red Hat é a líder mundial em soluções empresariais open source como Linux, nuvem, containers e Kubernetes. Fornecemos soluções robustas que facilitam o trabalho em diversas plataformas e ambientes, do datacenter principal até a borda da rede.
Selecione um idioma
Red Hat legal and privacy links
- Sobre a Red Hat
- Oportunidades de emprego
- Eventos
- Escritórios
- Fale com a Red Hat
- Blog da Red Hat
- Diversidade, equidade e inclusão
- Cool Stuff Store
- Red Hat Summit