O motivo pelo qual o docker está parando o contêiner é porque o apache no contêiner está falhando.
Estou tentando me familiarizar com o CoreOS (v. 633.1.0) e ando brincando com o fleet (estou usando a configuração de cluster do Vagrant 3 recomendada em minha máquina local). Eu criei o seguinte serviço muito básico ( [email protected]
):
[Unit]
Description="Dummy Apache service"
After="docker.service"
Requires="docker.service"
[Service]
TimeoutStartSec=0
TimeoutStopSec=30
ExecStartPre=-/usr/bin/docker kill apache1
ExecStartPre=-/usr/bin/docker rm apache1
ExecStartPre=/usr/bin/docker pull coreos/apache
ExecStart=/usr/bin/docker run --rm --name apache1 -p 80:80 coreos/apache /usr/sbin/apache2ctl -D FOREGROUND
ExecStop=/usr/bin/docker stop apache1
[X-Fleet]
Conflicts=apache@*.service
Quando eu começo, ele funciona perfeitamente, no entanto, sempre que tento pará-lo, ele é marcado como falho. Esta é a saída de, e. fleetctl status apache@2
quando o comando stop
inicia a execução:
core@core-01 ~ $ fleetctl stop apache@2
Unit [email protected] loaded on a91e28b0.../172.17.8.101
core@core-01 ~ $ fleetctl status apache@2
● [email protected] - "Dummy Apache container"
Loaded: loaded (/run/fleet/units/[email protected]; linked-runtime; vendor preset: disabled)
Active: deactivating (stop) since Wed 2015-04-15 18:45:46 UTC; 2s ago
Process: 1038 ExecStartPre=/usr/bin/docker pull coreos/apache (code=exited, status=0/SUCCESS)
Process: 1030 ExecStartPre=/usr/bin/docker rm apache1 (code=exited, status=1/FAILURE)
Process: 1024 ExecStartPre=/usr/bin/docker kill apache1 (code=exited, status=1/FAILURE)
Main PID: 1375 (docker); : 1522 (docker)
CGroup: /system.slice/system-apache.slice/[email protected]
├─1375 /usr/bin/docker run --rm --name apache1 -p 80:80 coreos/apache /usr/sbin/apache2ctl -D FOREGROUND
└─control
└─1522 /usr/bin/docker stop apache1
Apr 15 18:43:18 core-01 docker[1038]: 9cd978db300e: Pulling fs layer
Apr 15 18:44:26 core-01 docker[1038]: 9cd978db300e: Download complete
Apr 15 18:44:26 core-01 docker[1038]: 87026dcb0044: Pulling metadata
Apr 15 18:44:27 core-01 docker[1038]: 87026dcb0044: Pulling fs layer
Apr 15 18:44:53 core-01 docker[1038]: 87026dcb0044: Download complete
Apr 15 18:44:53 core-01 docker[1038]: 87026dcb0044: Download complete
Apr 15 18:44:53 core-01 docker[1038]: Status: Downloaded newer image for coreos/apache:latest
Apr 15 18:44:53 core-01 systemd[1]: Started "Dummy Apache container".
Apr 15 18:44:53 core-01 docker[1375]: apache2: Could not reliably determine the server's fully qualified domain name, using 10.1.0.2 for ServerName
Apr 15 18:45:46 core-01 systemd[1]: Stopping "Dummy Apache container"...
No entanto, depois de alguns segundos:
core@core-01 ~ $ fleetctl status apache@2
● [email protected] - "Dummy Apache container"
Loaded: loaded (/run/fleet/units/[email protected]; linked-runtime; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2015-04-15 18:45:56 UTC; 1s ago
Process: 1522 ExecStop=/usr/bin/docker stop apache1 (code=exited, status=0/SUCCESS)
Process: 1375 ExecStart=/usr/bin/docker run --rm --name apache1 -p 80:80 coreos/apache /usr/sbin/apache2ctl -D FOREGROUND (code=exited, status=137)
Process: 1038 ExecStartPre=/usr/bin/docker pull coreos/apache (code=exited, status=0/SUCCESS)
Process: 1030 ExecStartPre=/usr/bin/docker rm apache1 (code=exited, status=1/FAILURE)
Process: 1024 ExecStartPre=/usr/bin/docker kill apache1 (code=exited, status=1/FAILURE)
Main PID: 1375 (code=exited, status=137)
Apr 15 18:44:53 core-01 docker[1038]: 87026dcb0044: Download complete
Apr 15 18:44:53 core-01 docker[1038]: Status: Downloaded newer image for coreos/apache:latest
Apr 15 18:44:53 core-01 systemd[1]: Started "Dummy Apache container".
Apr 15 18:44:53 core-01 docker[1375]: apache2: Could not reliably determine the server's fully qualified domain name, using 10.1.0.2 for ServerName
Apr 15 18:45:46 core-01 systemd[1]: Stopping "Dummy Apache container"...
Apr 15 18:45:56 core-01 docker[1522]: apache1
Apr 15 18:45:56 core-01 systemd[1]: [email protected]: main process exited, code=exited, status=137/n/a
Apr 15 18:45:56 core-01 systemd[1]: Stopped "Dummy Apache container".
Apr 15 18:45:56 core-01 systemd[1]: Unit [email protected] entered failed state.
Apr 15 18:45:56 core-01 systemd[1]: [email protected] failed.
Pelo que vejo, parece que o docker está eliminando o contêiner (que é o que o comando ExecStartPre
deveria estar fazendo quando o serviço é iniciado). Também investiguei os logs com journalctl
e parece ser o caso.
Correção: Após uma inspeção adicional (e olhos mais próximos da tela), notei essa linha muito importante nos registros:
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="Container afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1 failed to exit within 10 seconds of SIGTERM -
Portanto, posso ver que a janela de encaixe não consegue parar o serviço adequadamente (a razão pela qual não entendo), no entanto, ainda não entendi por que ele prossegue e tenta destruir o contêiner. Aqui está o resto dos registros relevantes:
Apr 15 18:45:46 core-01 systemd[1]: Stopping "Dummy Apache container"...
Apr 15 18:45:46 core-01 fleetd[880]: INFO manager.go:145: Triggered systemd unit [email protected] stop: job=2115
Apr 15 18:45:46 core-01 fleetd[880]: INFO reconcile.go:311: AgentReconciler completed task: type=StopUnit [email protected] reason="unit currently launched but desired state is loaded"
Apr 15 18:45:46 core-01 dockerd[881]: time="2015-04-15T18:45:46Z" level="info" msg="POST /v1.17/containers/apache1/stop?t=10"
Apr 15 18:45:46 core-01 dockerd[881]: time="2015-04-15T18:45:46Z" level="info" msg="+job stop(apache1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="Container afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1 failed to exit within 10 seconds of SIGTERM -
Apr 15 18:45:56 core-01 kernel: docker0: port 1(veth87a841f) entered disabled state
Apr 15 18:45:56 core-01 kernel: device veth87a841f left promiscuous mode
Apr 15 18:45:56 core-01 kernel: docker0: port 1(veth87a841f) entered disabled state
Apr 15 18:45:56 core-01 systemd-networkd[830]: veth87a841f : lost carrier
Apr 15 18:45:56 core-01 systemd-networkd[830]: veth87a841f : could not find udev device: No such device
Apr 15 18:45:56 core-01 systemd-networkd[830]: docker0 : lost carrier
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job log(die, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job log(die, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job release_interface(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job attach(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="POST /v1.17/containers/afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1/wait"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job wait(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job release_interface(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job log(stop, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job log(stop, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job stop(apache1) = OK (0)"
Apr 15 18:45:56 core-01 docker[1522]: apache1
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job wait(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="GET /v1.17/containers/afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1/json"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job container_inspect(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job container_inspect(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="DELETE /v1.17/containers/afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1?v=1"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job rm(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="POST /v1.17/containers/afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1/kill?signal=TERM"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job kill(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, TERM)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="+job log(destroy, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job log(destroy, afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, coreos/apache:latest) = OK (0)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job rm(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1) = OK (0)"
Apr 15 18:45:56 core-01 systemd[1]: [email protected]: main process exited, code=exited, status=137/n/a
Apr 15 18:45:56 core-01 systemd[1]: Stopped "Dummy Apache container".
Apr 15 18:45:56 core-01 systemd[1]: Unit [email protected] entered failed state.
Apr 15 18:45:56 core-01 systemd[1]: [email protected] failed.
Apr 15 18:45:56 core-01 dockerd[881]: No such container: afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="info" msg="-job kill(afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1, TERM) = ERR (1)"
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="error" msg="Handler for POST /containers/{name:.*}/kill returned error: No such container: afa4e52d8ff2edaa53d2bae1177535d669a4758f
Apr 15 18:45:56 core-01 dockerd[881]: time="2015-04-15T18:45:56Z" level="error" msg="HTTP Error: statusCode=404 No such container: afa4e52d8ff2edaa53d2bae1177535d669a4758f1cc524056ba55a9da383ffd1"
Eu não tenho certeza do que está acontecendo aqui, então qualquer ajuda é muito apreciada.
UPDATE: Apenas para testar, aumentamos os tempos limite no meu arquivo de serviço ( docker stop -t 300 apache1
e TimeoutStopSec = 300
), no entanto, o contêiner não consegue parar mesmo após 5 minutos de espera . Para ter certeza, tentei parar o container diretamente na linha de comando ( docker stop apache1
), e é claro que funciona muito bem. Portanto, parece haver algo que impede que o contêiner seja interrompido corretamente por meio de fleetctl
.
Obrigado pelo seu tempo!
O motivo pelo qual o docker está parando o contêiner é porque o apache no contêiner está falhando.
Tem certeza de que está funcionando perfeitamente?
Eu diria que o problema mais provável é que ele não está sendo executado, mas está sendo executado, em seguida, parando e reiniciando. Muito provavelmente porque a porta 80 está em uso, pois você está ligando isso à porta do host 80 em todas as instâncias. Pelo menos 2 deles irão colidir com os outros, já que você está no mesmo host executando o vagrant. Isso não seria um problema se o arquivo da unidade estivesse em máquinas separadas e nada estivesse usando a porta 80.
Para contornar isso, use -P em vez disso. E, em seguida, em cada máquina, você terá que executar o docker ps para ver qual porta está realmente sendo usada do lado de fora.