PODs reiniciados inesperadamente

1

Meus ambientes de desenvolvimento estão sendo executados no Google Container Engine e nos seguintes PODs criados pelo Replication Controller

NAME                  READY     STATUS    RESTARTS   AGE       NODE
couchdb-dev-ocbud     1/1       Running   3          13h       cz5w
couchdb-stage-8f9bn   1/1       Running   1          13h       uqu4
etcd-1-rmwzy          1/1       Running   0          3d        q0cz
etcd-2-n4ckp          1/1       Running   8          3d        uqu4
etcd-3-yzz2x          1/1       Running   0          3d        yt9e
mongodb-dev-ig9xo     1/1       Running   3          16h       cz5w
mysql-dev-rykih       1/1       Running   3          17h       cz5w
mysql-stage-n240p     1/1       Running   3          16h       cz5w
redis-dev-19dxg       0/1       Running   5          3d        cz5w
redis-dev-s5v6k       1/1       Running   0          3d        yt9e
redis-dev-wccyb       0/1       Running   8          3d        uqu4
redis-stage-qnbb6     0/1       Running   8          3d        uqu4
redis-stage-xb54r     0/1       Running   0          3d        yt9e
redis-stage-xntc2     0/1       Running   5          3d        cz5w
shadowsocks-b8009     1/1       Running   0          2d        q0cz
shadowsocks-i1anu     1/1       Running   0          2d        yt9e
ts-stage-4esg8        1/1       Running   8          3d        uqu4
ts-stage-cer7a        1/1       Running   5          3d        cz5w
ts-stage-dtpdh        1/1       Running   0          3d        yt9e
ts-stage-mah7w        1/1       Running   0          3d        q0cz
uls-dev-upibo         1/1       Running   5          1d        cz5w
uls-stage-zht0j       1/1       Running   6          1d        uqu4
zookeeper-1-4dklm     1/1       Running   0          3d        q0cz
zookeeper-2-pw13k     1/1       Running   8          3d        uqu4
zookeeper-3-u9a34     1/1       Running   0          3d        yt9e

PODs no NODE uqu4 foram reiniciados por 8 vezes sem a minha interação.

Aqui está o motivo da terminação de kubectl describe pod <pod> , o código de erro é 137

Last Termination State: Terminated
  Reason:           Error
  Exit Code:        137
  Started:          Mon, 21 Mar 2016 08:33:24 +0000
  Finished:         Mon, 21 Mar 2016 21:04:57 +0000
Ready:          True
Restart Count:      8

Quando eu ssh para o nó uqu4 , recebo um aviso como abaixo

WARNING: Could not setup log file in /root/.config/gcloud/logs, (OSError: [Errno 28] No space left on device: '/root/.config/gcloud/logs/2016.03.22')

O df -h parece ok

Filesystem                                              Size  Used Avail Use% Mounted on
rootfs                                                   99G   14G   82G  14% /
udev                                                     10M     0   10M   0% /dev
tmpfs                                                   750M  340K  750M   1% /run
/dev/disk/by-uuid/6be8ff15-205a-4019-99e0-92d9c347301b   99G   14G   82G  14% /
tmpfs                                                   5.0M     0  5.0M   0% /run/lock
tmpfs                                                   1.5G  1.7M  1.5G   1% /run/shm
cgroup                                                  3.7G     0  3.7G   0% /sys/fs/cgroup
tmpfs                                                   3.7G  8.0K  3.7G   1% /var/lib/kubelet/pods/46f374dc-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~secret/default-token-binen
tmpfs                                                   3.7G  8.0K  3.7G   1% /var/lib/kubelet/pods/4a17371c-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~secret/default-token-binen
/dev/sdb                                                976M  187M  722M  21% /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/etcd-2-data-disk
/dev/sdb                                                976M  187M  722M  21% /var/lib/kubelet/pods/4a13021d-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~gce-pd/etcd-data
tmpfs                                                   3.7G  8.0K  3.7G   1% /var/lib/kubelet/pods/4a13021d-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~secret/default-token-binen
/dev/sdc                                                976M  9.5M  900M   2% /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/zookeeper-2-data-disk
/dev/sdc                                                976M  9.5M  900M   2% /var/lib/kubelet/pods/4a5933ee-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~gce-pd/zookeeper-2-data
tmpfs                                                   3.7G  8.0K  3.7G   1% /var/lib/kubelet/pods/4a5933ee-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~secret/default-token-binen
tmpfs                                                   3.7G  8.0K  3.7G   1% /var/lib/kubelet/pods/b93210e7-ecfb-11e5-a962-42010af00080/volumes/kubernetes.io~secret/default-token-binen
/dev/sdd                                                 30G   48M   28G   1% /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/uls-stage-data-disk
/dev/sdd                                                 30G   48M   28G   1% /var/lib/kubelet/pods/f2764484-ee6b-11e5-a962-42010af00080/volumes/kubernetes.io~gce-pd/uls-stage-data-disk
tmpfs                                                   3.7G  8.0K  3.7G   1% /var/lib/kubelet/pods/f2764484-ee6b-11e5-a962-42010af00080/volumes/kubernetes.io~secret/default-token-binen
/dev/sde                                                 50G   52M   47G   1% /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/couchdb-stage-data-disk
/dev/sde                                                 50G   52M   47G   1% /var/lib/kubelet/pods/e721dfb1-ef5b-11e5-a962-42010af00080/volumes/kubernetes.io~gce-pd/couchdb-stage-data-disk
tmpfs                                                   3.7G  8.0K  3.7G   1% /var/lib/kubelet/pods/e721dfb1-ef5b-11e5-a962-42010af00080/volumes/kubernetes.io~secret/default-token-binen
/dev/disk/by-uuid/6be8ff15-205a-4019-99e0-92d9c347301b   99G   14G   82G  14% /var/lib/docker/aufs
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/8d9c854d1688439657c6b55107f6898d6b9fbdb74b9610dd0b48a1b22c6102d1
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/9e09bc6c69af03192569ba25762861edd710bf45baf65c449a4caf5ad69500f3
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/f82c122422db51310ce965173ca2b043ffa7b55b84f5b28bf9c19004a3e44fa9
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/6a0ccec3cedbcdf481a2ce528f2dcc9d1626f263591bebdb96a77beea0c0443f
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/ae8059fb1c2abbbffc72813a0355a4dd3d2633c720ef61b16d19a46ed2d63358
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/9d5b9ad1148e1ee4e10f826fc500f0a5c549bdc9ed66519e5f3b222d99641dfd
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/668f95f658cb13457b193f31716df5e5b8da7f227bc3ae1e0367098ec20580b0
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/bdf7d3660b81879c75a0048f921fa47b0138c3a9ec5454e85a55e62ccf9d86fe
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/8cb75d5e0df5d34ceefe41ec55a88198568a0670b6bddade4d8bb7194ba49779
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/a9bb332d1aebc349d1440416a59f898f9ed12be1c744e11e8f3e502dd630df0e
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/36a2bd14af419e19fe89fe32e3f02f490f5553246e76d6c7636ae80e6bba8662
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/a8c983eb3b1701263d1501b025f080ae0d967ddee2fd4bd5071e6e9297b389b9
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/e0131ab5360fce8e3a83522b9bc7176d005b893b726bf616d0ee2f7e5ab4269e
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/2e1fd00cb2ec9ca11323b3ac66f413b6873ca2e949ceb3ba5eb368de8de18af5
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/78c89fcc2b2a81c8544883209fac002a3525fed8504ebf43722b592179264dea
none                                                     99G   14G   82G  14% /var/lib/docker/aufs/mnt/4e56c31cbc3dfde7df17c1075595d80214dc81e55093ee9d9b63ef88b09502ad

Aqui está o resultado de free

             total       used       free     shared    buffers     cached
Mem:       7679824    5625036    2054788          0     207872    1148568
-/+ buffers/cache:    4268596    3411228
Swap:            0          0          0

Qual é o motivo que faz com que os PODs reiniciem?

    
por Mr.Wang from Next Door 22.03.2016 / 03:21

1 resposta

1

Recomendarei a execução dos seguintes comandos para visualizar o estado atual dos pods / nodes:

  • kubectl describe pod 'failing pod'
  • kubectl get pod -o go-template='{{range.status.containerStatuses}}{{"Container Name: "}}{{.name}}{{"\r\nLastState: "}}{{.lastState}}{{end}}' 'failing pod'
  • kubectl describe node 'node where pod is failing'
  • kubectl get events

Esses comandos podem fornecer informações detalhadas sobre os pods que podem estar falhando, bem como os nós nos quais os pods estão sendo criados. Este link do Kubernetes tem mais informações sobre como determinar a causa raiz por uma falha no pod.

Para monitorar os recursos usados pelos pods, é melhor usar as ferramentas de monitoramento sugerido pelo Kubernetes ou pela UI da Web (Dashboard) , pois essas ferramentas podem fornecer informações detalhadas sobre os recursos usado por todos os pods.

    
por 12.04.2017 / 19:39