自定义IDC资源水位触发调度

最近更新时间:2022-05-24 21:21:36

查看PDF

在用户将自建集群和容器实例结合使用,利用云上资源来应对业务高峰的场景下,KCI调度插件支持在自建资源的分配率达到一个可配置的阈值后,就将新创建的Pod优先调度到容器实例上。

KCI调度插件简介

金山云提供了配合virtual-kubelet使用的scheduler-extender,支持如下调度策略:

  • 当用户自建集群资源分配率未达到指定阈值时,优先使用自建资源创建pod
  • 当自建集群资源分配超过阈值时,优先调度到virtual-kubelet节点上(即使用云上资源创建pod),并且可为不同vk节点设置权重

使用须知

  • 目前提供了2个指标,“CPU分配率”和“内存分配率”。只要有一个指标超过阈值,即优先调度到vk上
  • 计算资源分配率时,只统计role为node的节点(不统计master节点),为保证调度插件功能正常,使用前需要确认集群node节点上,已经打上label:kubernetes.io/role=node
  • 使用前,需要移除virtual-kubelet上的污点,使得vk可以参与k8s的正常调度

插件安装和部署

部署extender

yaml详情如下:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: extender-conf
  namespace: kube-system
data:
  cpulimit: "0.5"   # 所有pod的request资源/所有node的allocatable资源的值,如果不关心它则设置为-1
  memlimit: "0.7"   # 如果不关心这个指标,设置为-1
  weight: '[        # weight是可选配置,配置不同vk节点的调度权重
    {
      "nodeName": "rbkci-virtual-kubelet",   # vk节点的名字
      "weight": 5
    },
    {
      "nodeName": "virtual-kubelet-cn-beijing-i",
      "weight": 1
    }
  ]'
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: scheduler-extender
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: scheduler-extender-admin
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    namespace: kube-system
    name: scheduler-extender
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-scheduler-policy
  namespace: kube-system
data:
 policy.cfg : |
  {
    "kind" : "Policy",
    "apiVersion" : "v1",
    "extenders" : [{
      "urlPrefix": "http://kci-scheduler-extender.kube-system/scheduler",
      "filterVerb": "predicates/always_true",
      "prioritizeVerb": "priorities/group_score",
      "preemptVerb": "preemption",
      "bindVerb": "",
      "weight": 1,
      "enableHttps": false,
      "nodeCacheCapable": false
    }]
  }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kci-scheduler-extender
  namespace: kube-system
  labels:
    app: kci-scheduler-extender
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kci-scheduler-extender
  template:
    metadata:
      labels:
        app: kci-scheduler-extender
    spec:
      serviceAccountName: scheduler-extender
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: type
                operator: NotIn
                values:
                - virtual-kubelet
      containers:
      - name: kci-scheduler-extender
        image: hub.kce.ksyun.com/ksyun/ksc-scheduler-extender:latest
        imagePullPolicy: IfNotPresent
        ports:
          - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: kci-scheduler-extender
  name: kci-scheduler-extender
  namespace: kube-system
spec:
  ports:
    - name: server-port
      port: 80
      protocol: TCP
      targetPort: 80
  selector:
    app: kci-scheduler-extender
  type: ClusterIP

确认kube-scheduler配置了configmap的访问权限

检查kube-scheduler是否配置了configmap的访问权限

# kubectl get clusterrole system:kube-scheduler -o yaml
...
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - get
  - list
  - watch

如果没有上面的配置,需要执行kubectl edit clusterrole system:kube-scheduler,把上面几行添加进去

修改kube-scheduler配置

修改kube-scheduler的配置(kube-scheduler一般以static pod的方式部署在多个节点上,每个副本都要修改)。具体改动如下:

  1. command中增加 --policy-configmap=custom-scheduler-policy
  2. dnsPolicy设置为:ClusterFirstWithHostNet

重启kube-scheduler生效

apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: kube-scheduler
    tier: control-plane
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - /usr/local/bin/kube-scheduler
    - --logtostderr=true
    - --v=10
    - --kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig
    - --leader-elect=true
    - --leader-elect-lease-duration=60s
    - --leader-elect-renew-deadline=30s
    - --leader-elect-retry-period=10s
    - --kube-api-qps=100
    - --policy-configmap=custom-scheduler-policy   # 新增--policy-configmap配置
    image: hub.kce.ksyun.com/ksyun/kube-scheduler:v1.17.6-mp
    imagePullPolicy: Always
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /healthz
        port: 10251
        scheme: HTTP
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: kube-scheduler
    resources:
      requests:
        cpu: 100m
    volumeMounts:
    - mountPath: /etc/kubernetes
      name: k8s
      readOnly: true
    - mountPath: /etc/localtime
      name: time-zone
      readOnly: true
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet  # dnsPolicy设置为ClusterFirstWithHostNet
  tolerations:
  - operator: Exists
  - key: CriticalAddonsOnly
    operator: Exists
  - effect: NoExecute
    operator: Exists
  volumes:
  - hostPath:
      path: /etc/kubernetes
    name: k8s
  - hostPath:
      path: /etc/localtime
    name: time-zone

插件使用示例

1.查看extender-conf的ConfigMap文件,了解当前的调度策略

kubectl -n kube-system describe cm extender-conf

预期输出:

Name:         extender-conf
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>

Data
====
cpulimit:
----
0.5
memlimit:
----
0.7
weight:
----
[ { "nodeName": "rbkci-virtual-kubelet", "weight": 3 }, { "nodeName": "cn-zhangjiakou.vnd-8vb0w6ot6evayqdha0a3", "weight": 1 } ]
Events:  <none>

当前调度策略表示:当自建集群的cpu分配率超过50%,或者内存分配率超过70%时,新创建的Pod将会调度到virtual-kubelet节点上,且此时每创建4个pod,将有3个调度到权重为3的vk上,有1个调度到权重为1的vk上。

2.查看调度策略效果

(1)执行以下命令,查看集群节点

kubectl get node -o wide

预期输出:

NAME                                      STATUS   ROLES    AGE    VERSION
10.0.0.179                                Ready    node     46d    v1.21.3
10.0.0.214                                Ready    master   46d    v1.21.3
10.0.0.8                                  Ready    master   46d    v1.21.3
10.0.0.83                                 Ready    master   46d    v1.21.3
10.0.0.96                                 Ready    node     46d    v1.21.3
cn-zhangjiakou.vnd-8vbahalwna5205drcwvr   Ready    agent    135m   v1.21.3
rbkci-virtual-kubelet                     Ready    agent    34d    v1.19.3-vk-v1.1.0

从以上预期输出可以看出,当前集群有10.0.0.179、10.0.0.96这两个Worker节点和rbkci-virtual-kubelet、cn-zhangjiakou.vnd-8vbahalwna5205drcwvr这两个virtual-kubelet节点。

(2)部署测试Deployment,其yaml如下:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx
        resources:
          requests:
            cpu: 100m
            memory: 100Mi

(3)执行以下命令,查看Pod分布节点

kubectl get pod -o wide

预期输出:

NAME                      READY   STATUS     RESTARTS   AGE     IP             NODE
nginx-68c8867f7b-rj9dg    1/1     Running    0          4s      10.244.1.74    10.0.0.179

从以上预期输出可以看出,由于cpu或内存分配率没有到达阈值,新创建的Pod调度到了自建集群的Worker节点上

(4)执行以下命令,将Deployment副本数扩容为7

kubectl scale deploy nginx --replicas=7

再次查看Pod分布节点

kubectl get pod -o wide

预期输出:

NAME                      READY   STATUS     RESTARTS   AGE     IP             NODE
nginx-68c8867f7b-94r4v    1/1     Running    0          3m19s   10.0.0.219     rbkci-virtual-kubelet
nginx-68c8867f7b-9wskx    1/1     Running    0          3m6s    10.0.0.219     cn-zhangjiakou.vnd-8vbahalwna5205drcwvr
nginx-68c8867f7b-pl4xl    1/1     Running    0          3m13s   10.0.0.219     rbkci-virtual-kubelet
nginx-68c8867f7b-pmxjf    1/1     Running    0          3m45s   10.0.0.219     rbkci-virtual-kubelet
nginx-68c8867f7b-rj9dg    1/1     Running    0          3m52s   10.0.0.219     10.0.0.179
nginx-68c8867f7b-sf7xk    1/1     Running    0          5m9s    10.0.0.219     10.0.0.179
nginx-68c8867f7b-twjl2    1/1     Running    0          4m9s    10.0.0.219     10.0.0.179

从以上预期输出可以看出,前3个Pod由于cpu或内存分配率没有到达阈值,调度到了自建集群的Worker节点上。通过查看节点的资源分配率,可以看到从第4个Pod起集群中所有pod的cpu request资源/所有node的allocatable资源的值将大于0.5,因此后4个Pod调度到了vk节点上,且根据权重设置,有3个调度到权重为3的vk上,有1个调度到权重为1的vk上。

文档内容是否对您有帮助?

根本没帮助
文档较差
文档一般
文档不错
文档很好

在文档使用中是否遇到以下问题

内容不全,不深入
内容更新不及时
描述不清晰,比较混乱
系统或功能太复杂,缺乏足够的引导
内容冗长

更多建议

0/200

评价建议不能为空

提交成功!

非常感谢您的反馈,我们会继续努力做到更好!

问题反馈