pods entrando no estado pendente em k8s

0

Eu tenho o escalonamento automático de clusters na minha implantação do k8s na AWS. O cluster é implantado via kops, o autoescalador de cluster estava funcionando bem por semanas, Meu cluster está na região de Mumbai (ap-south), que tem 2AZ. Minha configuração é de mestre único e nós de trabalho como min 1 e max 4 nós t2.medium. Antes do problema, havia dois nós de trabalho em execução.

de repente eu notei hoje que meus prometheus e grafana pods que têm um pvc anexado a eles cada, estavam correndo em AZ b, e meu autoescalador alguns como terminou esse nó, e os pods entrou em estado pendente, quando eu verifiquei eles mostraram um problema de tolerância múltipla , e o autoescalador de cluster nem sequer aumentou a escala em AZ B.

O nó trabalhador que estava em funcionamento estava em AZ A, então suspeito que o volume EBS de AZ B não possa ser anexado a um pod em AZ A,

para que o pod não possa ser agendado para esse nó, mas o escalonamento automático deve gerar automaticamente um novo nó no AZ B para os pods pendentes.

$ kubectl get pods -n monitoring1
NAME                                       READY     STATUS    RESTARTS   AGE
grafana-6f79f57bdf-dhpjn                   0/1       Pending   0          50m
node-exporter-9f5tc                        1/1       Running   0          5h
node-exporter-c5hfm                        1/1       Running   0          17d
prometheus-554c88bff5-s99lp                0/1       Pending   0          50m
prometheus-alertmanager-55776df4d5-vvs4p   1/1       Running   0          5h



Events:
  Type     Reason             Age                 From                Message
  ----     ------             ----                ----                -------
  Warning  FailedScheduling   22m (x4 over 40m)   default-scheduler   0/3 nodes are available: 1 node(s) had no available volume zone, 1 node(s) had taints that the pod didn't tolerate, 1 node(s) were not ready.
  Warning  FailedScheduling   7m (x121 over 43m)  default-scheduler   0/3 nodes are available: 1 node(s) had no available volume zone, 1 node(s) had taints that the pod didn't tolerate, 1 node(s) were not ready, 1 node(s) were out of disk space.
  Warning  FailedScheduling   3m                  default-scheduler   0/2 nodes are available: 1 node(s) had no available volume zone, 1 node(s) had taints that the pod didn't tolerate.
  Normal   NotTriggerScaleUp  2m (x234 over 43m)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added)



$ kubectl get nodes
NAME                                           STATUS    ROLES     AGE       VERSION
ip-172-28-48-11.ap-south-1.compute.internal    Ready     node      5h        v1.10.8
ip-172-28-48-244.ap-south-1.compute.internal   Ready     master    19d       v1.10.8



$ sudo tail -f   kube-scheduler.log
I1120 13:27:33.438310       1 scheduler.go:191] Failed to schedule pod: monitoring1/grafana-6f79f57bdf-dhpjn
I1120 13:27:33.438455       1 factory.go:1379] Updating pod condition for monitoring1/grafana-6f79f57bdf-dhpjn to (PodScheduled==False)
I1120 13:27:37.438495       1 scheduler.go:191] Failed to schedule pod: monitoring1/prometheus-554c88bff5-s99lp
I1120 13:27:37.439058       1 factory.go:1379] Updating pod condition for monitoring1/prometheus-554c88bff5-s99lp to (PodScheduled==False)
I1120 13:27:37.441452       1 scheduler.go:191] Failed to schedule pod: monitoring1/grafana-6f79f57bdf-dhpjn
I1120 13:27:37.441509       1 factory.go:1379] Updating pod condition for monitoring1/grafana-6f79f57bdf-dhpjn to (PodScheduled==False)
I1120 13:27:45.441715       1 scheduler.go:191] Failed to schedule pod: monitoring1/prometheus-554c88bff5-s99lp
I1120 13:27:45.441797       1 factory.go:1379] Updating pod condition for monitoring1/prometheus-554c88bff5-s99lp to (PodScheduled==False)
I1120 13:27:45.444270       1 scheduler.go:191] Failed to schedule pod: monitoring1/grafana-6f79f57bdf-dhpjn
I1120 13:27:45.444320       1 factory.go:1379] Updating pod condition for monitoring1/grafana-6f79f57bdf-dhpjn to (PodScheduled==False)

Estes são os registros do autoescalador

I1120 13:31:51.450181       1 scale_up.go:180] No expansion options
I1120 13:31:51.450417       1 static_autoscaler.go:280] Calculating unneeded nodes
I1120 13:31:51.489385       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"monitoring1", Name:"grafana-6f79f57bdf-qh7fc", UID:"4a9b6e2b-ecc8-11e8-861f-02d918125042", APIVersion:"v1", ResourceVersion:"4374789", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added)
I1120 13:31:51.489436       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"monitoring1", Name:"prometheus-554c88bff5-s99lp", UID:"2843eaf4-ecc0-11e8-861f-02d918125042", APIVersion:"v1", ResourceVersion:"4372830", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added)
I1120 13:31:51.520270       1 utils.go:407] Skipping ip-172-28-48-11.ap-south-1.compute.internal - node group min size reached
I1120 13:31:51.520305       1 utils.go:398] Skipping ip-172-28-48-244.ap-south-1.compute.internal - no node group config
I1120 13:31:51.520457       1 static_autoscaler.go:309] Scale down status: unneededOnly=false lastScaleUpTime=2018-11-20 09:01:39.646793784 +0000 UTC lastScaleDownDeleteTime=2018-11-20 13:05:34.287809904 +0000 UTC lastScaleDownFailTime=2018-11-20 08:20:10.088739569 +0000 UTC schedulablePodsPresent=false isDeleteInProgress=false
I1120 13:31:51.520481       1 static_autoscaler.go:312] Starting scale down
I1120 13:31:51.546145       1 scale_down.go:446] No candidates for scale down
I1120 13:31:52.391232       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
    
por Mohd 20.11.2018 / 18:41

0 respostas