As reinicializações aleatórias foram causadas pelo kernel panic, embora não houvesse logs sugerindo isso. Depois de instalar e configurar o kdump, obtive um rastreamento de pilha que me ajudou a identificar o problema. Não é óbvio.
Eu tenho um servidor Debian 8 que reinicia aleatoriamente. Eu tentei procurar logs de journalctl de botas anteriores (logs são persistentes) mas não encontrei nada:
$ journalctl -b -1 -e
Tentei passar por todos os logs (reiniciar, fechar, fechar, reiniciar, pânico) - nada de útil:
$ grep -rn "reboot" /var/log
Consegui reproduzi-lo em alguns nós do GCP e OVH (VPS, dedicado). Embora alguns dos nós com a configuração semelhante funcionem bem.
$ last reboot
reboot system boot 3.16.0-4-amd64 Mon May 29 13:20 - 14:21 (01:00)
reboot system boot 3.16.0-4-amd64 Mon May 29 13:11 - 14:21 (01:10)
reboot system boot 3.16.0-4-amd64 Mon May 29 13:06 - 14:21 (01:15)
reboot system boot 3.16.0-4-amd64 Mon May 29 12:58 - 14:21 (01:23)
reboot system boot 3.16.0-4-amd64 Mon May 29 10:53 - 14:21 (03:28)
reboot system boot 3.16.0-4-amd64 Mon May 29 09:51 - 10:52 (01:01)
reboot system boot 3.16.0-4-amd64 Sun May 28 20:29 - 10:52 (14:23)
reboot system boot 3.16.0-4-amd64 Sun May 28 20:01 - 10:52 (14:51)
reboot system boot 3.16.0-4-amd64 Sun May 28 18:45 - 10:52 (16:07)
reboot system boot 3.16.0-4-amd64 Sun May 28 18:36 - 10:52 (16:16)
reboot system boot 3.16.0-4-amd64 Sun May 28 18:19 - 10:52 (16:33)
reboot system boot 3.16.0-4-amd64 Sun May 28 17:51 - 10:52 (17:01)
reboot system boot 3.16.0-4-amd64 Sun May 28 10:20 - 10:52 (1+00:31)
reboot system boot 3.16.0-4-amd64 Sun May 28 09:04 - 10:52 (1+01:48)
reboot system boot 3.16.0-4-amd64 Sun May 28 08:54 - 10:52 (1+01:58)
reboot system boot 3.16.0-4-amd64 Sun May 28 08:48 - 10:52 (1+02:03)
reboot system boot 3.16.0-4-amd64 Sun May 28 08:42 - 10:52 (1+02:10)
reboot system boot 3.16.0-4-amd64 Sun May 28 08:35 - 10:52 (1+02:17)
reboot system boot 3.16.0-4-amd64 Sun May 28 08:18 - 10:52 (1+02:34)
reboot system boot 3.16.0-4-amd64 Sun May 28 08:12 - 10:52 (1+02:40)
reboot system boot 3.16.0-4-amd64 Sun May 28 05:34 - 10:52 (1+05:18)
reboot system boot 3.16.0-4-amd64 Sun May 28 01:03 - 10:52 (1+09:49)
reboot system boot 3.16.0-4-amd64 Sun May 28 01:00 - 10:52 (1+09:52)
reboot system boot 3.16.0-4-amd64 Sat May 27 23:20 - 10:52 (1+11:32)
reboot system boot 3.16.0-4-amd64 Sat May 27 21:22 - 10:52 (1+13:30)
reboot system boot 3.16.0-4-amd64 Sat May 27 21:17 - 10:52 (1+13:35)
reboot system boot 3.16.0-4-amd64 Sat May 27 20:52 - 10:52 (1+14:00)
reboot system boot 3.16.0-4-amd64 Sat May 27 19:32 - 10:52 (1+15:20)
reboot system boot 3.16.0-4-amd64 Sat May 27 18:07 - 10:52 (1+16:45)
reboot system boot 3.16.0-4-amd64 Sat May 27 17:52 - 10:52 (1+17:00)
reboot system boot 3.16.0-4-amd64 Sat May 27 16:32 - 10:52 (1+18:20)
reboot system boot 3.16.0-4-amd64 Sat May 27 12:25 - 10:52 (1+22:27)
reboot system boot 3.16.0-4-amd64 Sat May 27 12:16 - 10:52 (1+22:36)
reboot system boot 3.16.0-4-amd64 Sat May 27 11:07 - 10:52 (1+23:45)
reboot system boot 3.16.0-4-amd64 Sat May 27 09:53 - 10:52 (2+00:59)
reboot system boot 3.16.0-4-amd64 Sat May 27 09:09 - 10:52 (2+01:43)
reboot system boot 3.16.0-4-amd64 Sat May 27 06:39 - 10:52 (2+04:13)
reboot system boot 3.16.0-4-amd64 Sat May 27 06:06 - 10:52 (2+04:46)
reboot system boot 3.16.0-4-amd64 Sat May 27 05:00 - 10:52 (2+05:52)
reboot system boot 3.16.0-4-amd64 Sat May 27 04:53 - 10:52 (2+05:58)
reboot system boot 3.16.0-4-amd64 Sat May 27 03:40 - 10:52 (2+07:12)
reboot system boot 3.16.0-4-amd64 Sat May 27 01:57 - 10:52 (2+08:55)
reboot system boot 3.16.0-4-amd64 Sat May 27 01:13 - 10:52 (2+09:39)
reboot system boot 3.16.0-4-amd64 Fri May 26 22:51 - 10:52 (2+12:01)
reboot system boot 3.16.0-4-amd64 Fri May 26 20:54 - 10:52 (2+13:58)
reboot system boot 3.16.0-4-amd64 Fri May 26 16:50 - 10:52 (2+18:02)
reboot system boot 3.16.0-4-amd64 Fri May 26 15:58 - 10:52 (2+18:54)
reboot system boot 3.16.0-4-amd64 Fri May 26 15:21 - 10:52 (2+19:31)
reboot system boot 3.16.0-4-amd64 Fri May 26 14:41 - 10:52 (2+20:11)
reboot system boot 3.16.0-4-amd64 Fri May 26 13:23 - 10:52 (2+21:29)
reboot system boot 3.16.0-4-amd64 Fri May 26 11:44 - 10:52 (2+23:08)
reboot system boot 3.16.0-4-amd64 Fri May 26 10:55 - 10:52 (2+23:57)
reboot system boot 3.16.0-4-amd64 Fri May 26 10:36 - 10:52 (3+00:16)
reboot system boot 3.16.0-4-amd64 Fri May 26 10:12 - 10:52 (3+00:40)
reboot system boot 3.16.0-4-amd64 Fri May 26 08:27 - 10:52 (3+02:25)
reboot system boot 3.16.0-4-amd64 Fri May 26 08:25 - 10:52 (3+02:27)
reboot system boot 3.16.0-4-amd64 Fri May 26 08:17 - 10:52 (3+02:35)
reboot system boot 3.16.0-4-amd64 Fri May 26 06:45 - 10:52 (3+04:07)
reboot system boot 3.16.0-4-amd64 Fri May 26 04:53 - 10:52 (3+05:59)
reboot system boot 3.16.0-4-amd64 Fri May 26 04:23 - 10:52 (3+06:29)
reboot system boot 3.16.0-4-amd64 Thu May 25 16:25 - 10:52 (3+18:27)
reboot system boot 3.16.0-4-amd64 Thu May 25 16:01 - 10:52 (3+18:51)
reboot system boot 3.16.0-4-amd64 Thu May 25 15:41 - 10:52 (3+19:11)
reboot system boot 3.16.0-4-amd64 Thu May 25 15:24 - 10:52 (3+19:28)
reboot system boot 3.16.0-4-amd64 Thu May 25 15:10 - 10:52 (3+19:42)
reboot system boot 3.16.0-4-amd64 Thu May 25 14:10 - 10:52 (3+20:42)
reboot system boot 3.16.0-4-amd64 Thu May 25 13:54 - 10:52 (3+20:58)
reboot system boot 3.16.0-4-amd64 Thu May 25 13:31 - 10:52 (3+21:21)
reboot system boot 3.16.0-4-amd64 Thu May 25 13:20 - 10:52 (3+21:32)
reboot system boot 3.16.0-4-amd64 Thu May 25 13:03 - 10:52 (3+21:49)
reboot system boot 3.16.0-4-amd64 Thu May 25 12:42 - 10:52 (3+22:10)
reboot system boot 3.16.0-4-amd64 Thu May 25 11:52 - 10:52 (3+23:00)
reboot system boot 3.16.0-4-amd64 Thu May 25 11:44 - 10:52 (3+23:08)
reboot system boot 3.16.0-4-amd64 Thu May 25 11:24 - 10:52 (3+23:28)
reboot system boot 3.16.0-4-amd64 Thu May 25 07:17 - 10:52 (4+03:35)
reboot system boot 3.16.0-4-amd64 Wed May 24 04:42 - 10:52 (5+06:10)
reboot system boot 3.16.0-4-amd64 Wed May 24 04:37 - 04:42 (00:05)
É super estranho que não haja nada nos logs sugerindo quem desencadeou o reinício, nenhum pânico no kernel.
Eu tentei substituir /sbin/shutdown
como sugerido em Servidor reiniciando misteriosamente mas parece que ninguém roda isso.
Journalctl registra logo após a reinicialização: link
Por favor, sugira como posso depurar ainda mais.