Sou novo no CoreOs e tenho experimentado com ele no DigitalOcean. Deixe-me começar dizendo, não tenho certeza se isso é um problema da DigitalOcean ou do CoreOS.
Como replicar:
- Aumente 2 CoreOS e os una para formar um cluster por meio de
Cloud-Config.
- No Digital Ocean Dashboard, desligue as duas gotículas e
redimensionar.
- Ligue as duas gotículas.
- ssh em uma das gotículas
- execute o fleetctl list-machines
Você deve receber
2015/04/22 21:05:50 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused
2015/04/22 21:05:50 ERROR client.go:213: Unable to get result for {Get /coreos.com/fleet/machines}, retrying in 100ms
2015/04/22 21:05:50 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused
2015/04/22 21:05:50 ERROR client.go:213: Unable to get result for {Get /coreos.com/fleet/machines}, retrying in 200ms
2015/04/22 21:05:50 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused
2015/04/22 21:05:50 ERROR client.go:213: Unable to get result for {Get /coreos.com/fleet/machines}, retrying in 400ms
2015/04/22 21:05:51 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused
2015/04/22 21:05:51 ERROR client.go:213: Unable to get result for {Get /coreos.com/fleet/machines}, retrying in 800ms
2015/04/22 21:05:51 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused
2015/04/22 21:05:51 ERROR client.go:213: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 1s
Execução
journalctl -u etcd
mostrará
Apr 22 14:38:02 test etcd[578]: [etcd] Apr 22 14:38:02.471 INFO | f507c71154cc47b1804558c7298d0313: state changed from 'leader' to 'follower'.
Apr 22 14:38:02 test etcd[578]: [etcd] Apr 22 14:38:02.471 INFO | f507c71154cc47b1804558c7298d0313: term #7 started.
Apr 22 14:38:02 test etcd[578]: [etcd] Apr 22 14:38:02.471 INFO | f507c71154cc47b1804558c7298d0313: leader changed from 'f507c71154cc47b1804558c7298d0313' to ''.
Apr 22 14:38:11 test etcd[578]: [etcd] Apr 22 14:38:11.257 INFO | f507c71154cc47b1804558c7298d0313: state changed from 'follower' to 'candidate'.
Apr 22 14:38:11 test etcd[578]: [etcd] Apr 22 14:38:11.258 INFO | f507c71154cc47b1804558c7298d0313: leader changed from 'fa61f58c81fd4e7abe9ac0b6585fafef' to ''.
Apr 22 14:38:11 test etcd[578]: [etcd] Apr 22 14:38:11.546 INFO | f507c71154cc47b1804558c7298d0313: state changed from 'candidate' to 'follower'.
Apr 22 14:38:11 test etcd[578]: [etcd] Apr 22 14:38:11.547 INFO | f507c71154cc47b1804558c7298d0313: term #9 started.
Apr 22 14:41:14 test etcd[578]: [etcd] Apr 22 14:41:14.847 INFO | f507c71154cc47b1804558c7298d0313: snapshot of 10004 events at index 10004 completed
Apr 22 14:53:45 test etcd[578]: [etcd] Apr 22 14:53:45.297 INFO | f507c71154cc47b1804558c7298d0313: warning: heartbeat near election timeout: 359.350151ms
Apr 22 14:55:22 test etcd[578]: [etcd] Apr 22 14:55:22.381 INFO | f507c71154cc47b1804558c7298d0313: warning: heartbeat near election timeout: 1.574255587s
Apr 22 15:31:17 test etcd[578]: [etcd] Apr 22 15:31:17.551 INFO | f507c71154cc47b1804558c7298d0313: snapshot of 10001 events at index 20005 completed
Apr 22 16:19:53 test etcd[578]: [etcd] Apr 22 16:19:53.870 INFO | f507c71154cc47b1804558c7298d0313: snapshot of 10007 events at index 30012 completed
Apr 22 17:08:00 test etcd[578]: [etcd] Apr 22 17:08:00.254 INFO | f507c71154cc47b1804558c7298d0313: snapshot of 10007 events at index 40019 completed
Apr 22 17:57:30 test etcd[578]: [etcd] Apr 22 17:57:30.622 INFO | f507c71154cc47b1804558c7298d0313: snapshot of 10008 events at index 50027 completed
Apr 22 18:48:04 test etcd[578]: [etcd] Apr 22 18:48:04.084 INFO | f507c71154cc47b1804558c7298d0313: snapshot of 10008 events at index 60035 completed
Apr 22 19:38:37 test etcd[578]: [etcd] Apr 22 19:38:37.641 INFO | f507c71154cc47b1804558c7298d0313: snapshot of 10007 events at index 70042 completed
Apr 22 20:07:41 test etcd[578]: [etcd] Apr 22 20:07:39.493 INFO | f507c71154cc47b1804558c7298d0313: state changed from 'follower' to 'candidate'.
Apr 22 20:07:44 test etcd[578]: [etcd] Apr 22 20:07:44.282 INFO | f507c71154cc47b1804558c7298d0313: leader changed from 'fa61f58c81fd4e7abe9ac0b6585fafef' to ''.
Apr 22 20:07:44 test etcd[578]: [etcd] Apr 22 20:07:44.895 INFO | f507c71154cc47b1804558c7298d0313: state changed from 'candidate' to 'follower'.
Apr 22 20:07:44 test etcd[578]: [etcd] Apr 22 20:07:44.899 INFO | f507c71154cc47b1804558c7298d0313: term #13 started.
Apr 22 20:09:39 test etcd[578]: [etcd] Apr 22 20:09:39.269 INFO | f507c71154cc47b1804558c7298d0313: state changed from 'follower' to 'candidate'.
Apr 22 20:09:39 test etcd[578]: [etcd] Apr 22 20:09:39.302 INFO | f507c71154cc47b1804558c7298d0313: leader changed from 'fa61f58c81fd4e7abe9ac0b6585fafef' to ''.
Apr 22 20:09:39 test etcd[578]: [etcd] Apr 22 20:09:39.631 INFO | f507c71154cc47b1804558c7298d0313: state changed from 'candidate' to 'follower'.
Apr 22 20:09:39 test etcd[578]: [etcd] Apr 22 20:09:39.632 INFO | f507c71154cc47b1804558c7298d0313: term #15 started.
Apr 22 20:11:18 test systemd[1]: Stopping etcd...
Apr 22 20:11:18 test systemd[1]: etcd.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 22 20:11:18 test systemd[1]: Stopped etcd.
Apr 22 20:11:18 test systemd[1]: Unit etcd.service entered failed state.
Apr 22 20:11:18 test systemd[1]: etcd.service failed.
e o seguinte será exibido
systemctl cat etcd.service
/usr/lib64/systemd/system/etcd.service
[Unit]
Description=etcd
[Service]
User=etcd
PermissionsStartOnly=true
Environment=ETCDDATADIR=/var/lib/etcd
Environment=ETCD_NAME=%m
ExecStart=/usr/bin/etcd
Restart=always
RestartSec=10s
LimitNOFILE=40000
Este é um problema do CoreOS?
Quase todo o cluster CoreOS está quebrado. As máquinas não estão mais conectadas e não consigo descobrir uma maneira de vinculá-las ou impedir que isso aconteça. Não consigo encontrar nada sobre isso online.