A falha mais estranha do docker que eu já vi

4

Estou usando o Docker com docker-mailserver em um dos meus servidores. Problemas muito estranhos apareceram após a migração de alguns serviços do servidor Debian Jessie para o servidor Ubuntu 16.04 LTS. Parâmetros dos servidores:

Legado:

someuser@legacyserver:~$ uname -r
3.16.0-4-amd64
someuser@legacyserver:~$ dpkg -l | grep systemd
...215-17+deb8u7...
someuser@legacyserver:~$ cat /proc/cmdline
root=ZFS=rpool/ROOT/debian-1 ro boot=zfs quiet

Novo servidor:

someuser@newserver:~$ uname -r
4.4.0-21-generic
someuser@newserver:~$ dpkg -l | grep systemd
...229-4ubuntu4...
someuser@newserver:~$ cat /proc/cmdline
root=ZFS=rpool/ROOT/debian-1 apparmor=0 ro

Estou executando o docker-mailserver no docker no contêiner Debian Jessie systemd-nspawn. O primeiro problema que eu encorajei foi cgroups somente leitura no novo systemd, isso resolveu esse problema:

mount | grep cgroup | tail -n +2 | while read line
do
    umount -l $(echo $line | cut -f3 -d" ")
    mount -t $(echo $line | cut -f5 -d" ") -o $(echo $line | cut -f6 -    d" " | rev | cut -c2- | rev | cut -c2- | sed -e 's/ro,/rw,/g') $(echo     $line | cut -f1 -d" ") $(echo $line | cut -f3 -d" ")
done

Ele apenas remonta todos os cgroups de leitura-gravação (não pode usar -o remontar).

Mas, primeiro, estou passando para o contêiner systemd-nspawn e, em seguida, para o contêiner docker. Quando estou, por exemplo, recarregando o Postfix (ou fazendo qualquer outra coisa) ... BOTH CONTAINERS (docker aninhado e systemd-nspawn) sai tão silencioso quanto um mouse ... Assim:

someuser@newserver:~# rsh somesystemdcontainer
Last login: Sun Jun 25 15:27:24 CEST 2017 from host0 on pts/0
Linux somesystemdcontainer 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC     2016 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@somesystemdcontainer:~# rsh mail #this is the docker container
Last login: Sun Jun 25 13:28:18 UTC 2017 from 172.18.0.1 on pts/0
Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 4.4.0-21-generic x86_64)

 * Documentation:  https://help.ubuntu.com/
root@mail:~# service postfix reload
 * Reloading Postfix configuration...
   ...done.
root@mail:~# rlogin: connection closed.
root@newserver:~#

NADA EM DMESG, NADA NO LOG DE KERNEL, NADA EM QUALQUER LUGAR. Como você viu no cmdline, desativar o apparmor tanto no kernel quanto no lado do espaço do usuário não ajuda ... Depois de tentar interromper o contêiner systemd-nspawn:

jun 25 15:32:26 newserver kernel: INFO: task sh:10962 blocked for more than 120 seconds.
jun 25 15:32:26 newserver kernel:       Tainted: P           O    4.4.0-21-generic #37-Ubuntu
jun 25 15:32:26 newserver kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jun 25 15:32:26 newserver kernel: sh              D ffff88009ebb3c88     0 10962   9487 0x00000102
jun 25 15:32:26 newserver kernel:  ffff88009ebb3c88 0000000000000000 ffff88040dab3700 ffff8800c9450dc0
jun 25 15:32:26 newserver kernel:  ffff88009ebb4000 ffff8800c08008b0 0000000000000001 ffff8800c9450dc0
jun 25 15:32:26 newserver kernel:  ffff8800c2fe87e8 ffff88009ebb3ca0 ffffffff818203f5 ffff8800c9450dc0
jun 25 15:32:26 newserver kernel: Call Trace:
jun 25 15:32:26 newserver kernel:  [<ffffffff818203f5>] schedule+0x35/0x80
jun 25 15:32:26 newserver kernel:  [<ffffffff8111fd4f>] zap_pid_ns_processes+0x13f/0x1a0
jun 25 15:32:26 newserver kernel:  [<ffffffff8108432b>] do_exit+0xa6b/0xae0
jun 25 15:32:26 newserver kernel:  [<ffffffff8122383f>] ? dput+0x2f/0x220
jun 25 15:32:26 newserver kernel:  [<ffffffff81084423>] do_group_exit+0x43/0xb0
jun 25 15:32:26 newserver kernel:  [<ffffffff810904d2>] get_signal+0x292/0x600
jun 25 15:32:26 newserver kernel:  [<ffffffff8102e517>] do_signal+0x37/0x6f0
jun 25 15:32:26 newserver kernel:  [<ffffffff8181fd36>] ? __schedule+0x386/0xa10
jun 25 15:32:26 newserver kernel:  [<ffffffff81083526>] ? do_wait+0x116/0x240
jun 25 15:32:26 newserver kernel:  [<ffffffff8100320c>] exit_to_usermode_loop+0x8c/0xd0
jun 25 15:32:26 newserver kernel:  [<ffffffff81003c5e>] syscall_return_slowpath+0x4e/0x60
jun 25 15:32:26 newserver kernel:  [<ffffffff81824650>] int_ret_from_sys_call+0x25/0x8f
jun 25 15:32:53 newserver systemd[1]: [email protected]: State 'stop-sigterm' timed out. Killing.
jun 25 15:32:53 newserver systemd-nspawn[9483]: somesystemdcontainer login:
jun 25 15:32:53 newserver systemd[1]: [email protected]: Main process exited, code=killed, status=9/KILL
jun 25 15:32:53 newserver systemd[1]: Stopped Container somesystemdcontainer.
jun 25 15:32:53 newserver systemd[1]: [email protected]: Unit entered failed state.
jun 25 15:32:53 newserver systemd[1]: [email protected]: Failed with result 'signal'.
jun 25 15:32:53 newserver systemd[1]: Stopped Container somesystemdcontainer.
jun 25 15:32:53 newserver systemd-machined[2890]: Machine somesystemdcontainer terminated.

O 10962 é ... bash dentro do container DOCKER, que "pula fora do namespace" no pstree ...

O que devo fazer agora?

    
por MobileDevelopment 25.06.2017 / 15:56

0 respostas