Estou tentando entender por que o killer da OOM colheu um processo em uma máquina virtual KVM com 1 gigabyte de memória. Embora eu já saiba que rodar máquinas Linux sem swap não é a melhor idéia, estou tentando entender por que o processo de coleta aconteceu.
Primeiro de tudo, aqui está o log do kernel contendo todas as informações que recebi do killer da OOM:
Jan 4 10:04:40 <host> kernel: [53542.948247] tor invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
Jan 4 10:04:40 <host> kernel: [53542.948247] tor cpuset=/ mems_allowed=0
Jan 4 10:04:40 <host> kernel: [53542.948251] CPU: 0 PID: 1439 Comm: tor Not tainted 4.9.0-4-amd64 #1 Debian 4.9.65-3+deb9u1
Jan 4 10:04:40 <host> kernel: [53542.948251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/ 01/2014
Jan 4 10:04:40 <host> kernel: [53542.948252] 0000000000000000 ffffffffb2129134 ffffc4b4801afc28 ffffa0d89a244100
Jan 4 10:04:40 <host> kernel: [53542.948254] ffffffffb1ffe340 0000000000000000 0000000000000000 0000000c024213ca
Jan 4 10:04:40 <host> kernel: [53542.948255] ffffffffb1f84517 000000429a244100 ffffffffc0615695 0000000000000001
Jan 4 10:04:40 <host> kernel: [53542.948257] Call Trace:
Jan 4 10:04:40 <host> kernel: [53542.948261] [<ffffffffb2129134>] ? dump_stack+0x5c/0x78
Jan 4 10:04:40 <host> kernel: [53542.948263] [<ffffffffb1ffe340>] ? dump_header+0x78/0x1fd
Jan 4 10:04:40 <host> kernel: [53542.948264] [<ffffffffb1f84517>] ? get_page_from_freelist+0x3f7/0xb40
Jan 4 10:04:40 <host> kernel: [53542.948268] [<ffffffffc0615695>] ? virtballoon_oom_notify+0x25/0x70 [virtio_balloon]
Jan 4 10:04:40 <host> kernel: [53542.948269] [<ffffffffb1f804aa>] ? oom_kill_process+0x21a/0x3e0
Jan 4 10:04:40 <host> kernel: [53542.948271] [<ffffffffb1f8012d>] ? oom_badness+0xed/0x170
Jan 4 10:04:40 <host> kernel: [53542.948272] [<ffffffffb1f80941>] ? out_of_memory+0x111/0x470
Jan 4 10:04:40 <host> kernel: [53542.948273] [<ffffffffb1f85bb7>] ? __alloc_pages_slowpath+0xbb7/0xbf0
Jan 4 10:04:40 <host> kernel: [53542.948274] [<ffffffffb1f85dee>] ? __alloc_pages_nodemask+0x1fe/0x260
Jan 4 10:04:40 <host> kernel: [53542.948277] [<ffffffffb1fd6151>] ? alloc_pages_current+0x91/0x140
Jan 4 10:04:40 <host> kernel: [53542.948278] [<ffffffffb1f7e940>] ? filemap_fault+0x300/0x5a0
Jan 4 10:04:40 <host> kernel: [53542.948290] [<ffffffffc05677e1>] ? ext4_filemap_fault+0x31/0x50 [ext4]
Jan 4 10:04:40 <host> kernel: [53542.948293] [<ffffffffb1faef81>] ? __do_fault+0x81/0x170
Jan 4 10:04:40 <host> kernel: [53542.948294] [<ffffffffb1fb36c0>] ? handle_mm_fault+0xd80/0x12d0
Jan 4 10:04:40 <host> kernel: [53542.948296] [<ffffffffb2001827>] ? new_sync_read+0xd7/0x120
Jan 4 10:04:40 <host> kernel: [53542.948297] [<ffffffffb1e5ed9c>] ? __do_page_fault+0x25c/0x500
Jan 4 10:04:40 <host> kernel: [53542.948299] [<ffffffffb2408a48>] ? async_page_fault+0x28/0x30
Jan 4 10:04:40 <host> kernel: [53542.948299] Mem-Info:
Jan 4 10:04:40 <host> kernel: [53542.948301] active_anon:125187 inactive_anon:2602 isolated_anon:0
Jan 4 10:04:40 <host> kernel: [53542.948301] active_file:95 inactive_file:1644 isolated_file:0
Jan 4 10:04:40 <host> kernel: [53542.948301] unevictable:0 dirty:0 writeback:8 unstable:0
Jan 4 10:04:40 <host> kernel: [53542.948301] slab_reclaimable:4702 slab_unreclaimable:11348
Jan 4 10:04:40 <host> kernel: [53542.948301] mapped:1458 shmem:2652 pagetables:804 bounce:0
Jan 4 10:04:40 <host> kernel: [53542.948301] free:12103 free_pcp:89 free_cma:0
Jan 4 10:04:40 <host> kernel: [53542.948303] Node 0 active_anon:500748kB inactive_anon:10408kB active_file:380kB inactive_file:6576kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:5832kB dirty:0kB writeback:32kB shmem:10608kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB pages_scanned:374512 all_unreclaimable? yes
Jan 4 10:04:40 <host> kernel: [53542.948303] Node 0 DMA free:4124kB min:716kB low:892kB high:1068kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:1096kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 4 10:04:40 <host> kernel: [53542.948305] lowmem_reserve[]: 0 959 959 959 959
Jan 4 10:04:40 <host> kernel: [53542.948307] Node 0 DMA32 free:44288kB min:44336kB low:55420kB high:66504kB active_anon:500748kB inactive_anon:10408kB active_file:380kB inactive_file:6576kB unevictable:0kB writepending:32kB present:1032048kB managed:1004452kB mlocked:0kB slab_reclaimable:18808kB slab_unreclaimable:44296kB kernel_stack:1232kB pagetables:3216kB bounce:0kB free_pcp:356kB local_pcp:356kB free_cma:0kB
Jan 4 10:04:40 <host> kernel: [53542.948309] lowmem_reserve[]: 0 0 0 0 0
Jan 4 10:04:40 <host> kernel: [53542.948310] Node 0 DMA: 217*4kB (U) 171*8kB (U) 98*16kB (U) 0*32kB 1*64kB (U) 0*128kB 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4124kB
Jan 4 10:04:40 <host> kernel: [53542.948315] Node 0 DMA32: 2150*4kB (UMEH) 2083*8kB (UMEH) 1189*16kB (UMEH) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44288kB
Jan 4 10:04:40 <host> kernel: [53542.948320] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 4 10:04:40 <host> kernel: [53542.948321] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 4 10:04:40 <host> kernel: [53542.948321] 4401 total pagecache pages
Jan 4 10:04:40 <host> kernel: [53542.948322] 0 pages in swap cache
Jan 4 10:04:40 <host> kernel: [53542.948323] Swap cache stats: add 0, delete 0, find 0/0
Jan 4 10:04:40 <host> kernel: [53542.948323] Free swap = 0kB
Jan 4 10:04:40 <host> kernel: [53542.948323] Total swap = 0kB
Jan 4 10:04:40 <host> kernel: [53542.948324] 262010 pages RAM
Jan 4 10:04:40 <host> kernel: [53542.948324] 0 pages HighMem/MovableOnly
Jan 4 10:04:40 <host> kernel: [53542.948324] 6920 pages reserved
Jan 4 10:04:40 <host> kernel: [53542.948324] 0 pages hwpoisoned
Jan 4 10:04:40 <host> kernel: [53542.948325] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Jan 4 10:04:40 <host> kernel: [53542.948328] [ 437] 0 437 14212 372 24 3 0 0 systemd-journal
Jan 4 10:04:40 <host> kernel: [53542.948329] [ 467] 0 467 11493 313 24 3 0 -1000 systemd-udevd
Jan 4 10:04:40 <host> kernel: [53542.948330] [ 598] 0 598 3005 782 11 3 0 0 haveged
Jan 4 10:04:40 <host> kernel: [53542.948331] [ 603] 0 603 37702 11634 80 3 0 0 salt-minion
Jan 4 10:04:40 <host> kernel: [53542.948332] [ 605] 0 605 62529 306 27 3 0 0 rsyslogd
Jan 4 10:04:40 <host> kernel: [53542.948333] [ 609] 0 609 11618 141 27 3 0 0 systemd-logind
Jan 4 10:04:40 <host> kernel: [53542.948334] [ 612] 105 612 11283 117 26 3 0 -900 dbus-daemon
Jan 4 10:04:40 <host> kernel: [53542.948335] [ 620] 0 620 7483 62 19 3 0 0 cron
Jan 4 10:04:40 <host> kernel: [53542.948336] [ 621] 0 621 17486 189 37 3 0 -1000 sshd
Jan 4 10:04:40 <host> kernel: [53542.948337] [ 633] 0 633 3708 34 12 3 0 0 agetty
Jan 4 10:04:40 <host> kernel: [53542.948338] [ 716] 0 716 130502 15306 119 4 0 0 salt-minion
Jan 4 10:04:40 <host> kernel: [53542.948339] [ 734] 107 734 25516 143 24 3 0 0 ntpd
Jan 4 10:04:40 <host> kernel: [53542.948340] [ 738] 0 738 66842 12779 96 3 0 0 salt-minion
Jan 4 10:04:40 <host> kernel: [53542.948341] [ 1439] 109 1439 139276 93739 245 3 0 0 tor
Jan 4 10:04:40 <host> kernel: [53542.948342] Out of memory: Kill process 1439 (tor) score 368 or sacrifice child
Jan 4 10:04:40 <host> kernel: [53542.948364] Killed process 1439 (tor) total-vm:557104kB, anon-rss:374956kB, file-rss:0kB, shmem-rss:0kB
Jan 4 10:04:40 <host> kernel: [53542.966776] oom_reaper: reaped process 1439 (tor), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Agora tentei encontrar o culpado convertendo a quantidade de páginas para o uso de memória em MiB (x 4/1024) para todas as estatísticas de memória diferentes:
MemInfo Name Pages Memory in MiB
active_anon 125187 489.0117188
inactive_anon 2602 10.1640625
active_file 95 0.37109375
inactive_file 1644 6.421875
writeback 8 0.03125
slam_reclaimable 4702 18.3671875
slab_unreclaimable 11348 44.328125
mapped 1458 5.6953125
shmem 2652 10.359375
pagetables 804 3.140625
pagecache 4401 17.19140625
Mas, mesmo que eu adicione todos esses números juntos, acabarei com um uso de cerca de 600 MiB. Quando apenas usando a soma de todos os valores "rss" da tabela de processos impressa pelo OOM killer, eu acabo com apenas 135917 páginas (~ 530 MiB).
Agora, se subtrairmos esse número calculado da quantidade total de memória (por exemplo, 996MiB - 600MiB no pior dos casos com base nos números acima), ainda teremos mais de 300MiB de memória teoricamente utilizável, no entanto dentro do kernel log "free: 44288kB" afirma que apenas ~ 44MiB de memória são deixados.
Por que há uma diferença e o que eu perdi? Existem outras alocações de memória que precisam ser levadas em conta e que estão usando esse espaço?
EDIT 1: Eu quero mencionar que o hypervisor em si tem 32GB de memória e só roda aquela única VM no momento. Existem mais de 20 GB de memória disponíveis de acordo com meminfo / free / top e os outros poucos GB são usados para armazenamento em cache.
EDIT 2: Conforme a solicitação, listarei os módulos do kernel ativos no sistema hipervisor que está executando o Proxmox com o armazenamento do ZFS:
acpi_pad, aes_x86_64, aesni_intel, ahci, ast, autofs4, binfmt_misc, coretemp, crc32_pclmul, crct10dif_pclmul, cryptd, crypto_simd, dca, drm, drm_kms_helper, e1000e, fb_sys_fops, ghash_clmulni_intel, glue_helper, gre, hid, hid_generic, i2c_algo_bit, i2c_i801, ib_cm, ib_core, ib_iser, icp, ie31200_edac, igb, inet_diag, input_leds, intel_cstate, intel_pch_thermal, intel_powerclamp, intel_rapl, intel_rapl_perf, ip6_tables, ip6table_filter, ip_gre, ip_set, ip_tables, ip_tunnel, ipmi_devintf, ipmi_msghandler, ipmi_si, ipmi_ssif, iptable_filter, irqbypass, iscsi_tcp, iw_cm, joydev, kvm, kvm_intel, libahci, libcrc32c, libiscsi, libiscsi_tcp, lpc_ich, mac_hid, mei, mei_me, nf_conntrack, nf_conntrack_ipv4, nf_conntrack_ipv6, nf_conntrack_proto_gre, nf_defrag_ipv4, nf_defrag_ipv6, nf_nat, nf_nat_ipv4, nf_nat_ipv6, nfnetlink, nfnetlink_log, nls_iso8859_1, openvswitch, pcbc, pcspkr, pps_core, ptp, rdma_cm, scsi_transport_iscsi, shpchp, softdog, spl, sunrpc, syscopyarea, sysfillrect, sysimgblt, tap, tcp_diag, ttm, usbhid, usbkbd, usbmouse, vhost, vhost_net, video, x86_pkg_temp_thermal, x_tables, zavl, zcommon, zfs, znvpair, zunicode
Além disso, estes são os módulos do kernel carregados na VM afetada:
ablk_helper, aes_x86_64, aesni_intel, ata_generic, ata_piix, autofs4, binfmt_misc, bochs_drm, button, cdrom, crc16, crc32_pclmul, crc32c_generic, crc32c_intel, crct10dif_pclmul, cryptd, drm, drm_kms_helper, ecb, ehci_hcd, evdev, ext4, floppy, fscrypto, gf128mul, ghash_clmulni_intel, glue_helper, hid, hid_generic, i2c_piix4, ip6_tables, ip6t_REJECT, ip6table_filter, ip_set, ip_set_hash_net, ip_set_list_set, ip_tables, ipmi_msghandler, ipmi_watchdog, ipt_REJECT, iptable_filter, jbd2, joydev, libata, lrw, mbcache, nf_conntrack, nf_conntrack_ipv4, nf_conntrack_ipv6, nf_defrag_ipv4, nf_defrag_ipv6, nf_log_common, nf_log_ipv4, nf_log_ipv6, nf_reject_ipv4, nf_reject_ipv6, nfnetlink, parport, parport_pc, pcspkr, ppdev, psmouse, scsi_mod, sd_mod, serio_raw, sg, shpchp, sr_mod, ttm, uhci_hcd, usb_common, usbcore, usbhid, virtio, virtio_balloon, virtio_net, virtio_pci, virtio_ring, virtio_scsi, x_tables, xt_addrtype, xt_comment, xt_conntrack, xt_limit, xt_LOG, xt_multiport, xt_recent, xt_tcpudp
Por último, mas não menos importante, a linha de comando que está sendo usada para iniciar a máquina virtual:
/usr/bin/kvm -id 100 -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -pidfile /var/run/qemu-server/100.pid -daemonize -smbios type=1,uuid=<censored> -name <censored> -smp 1,sockets=1,cores=1,maxcpus=1 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vga std -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu host,+kvm_pv_unhalt,+kvm_pv_eoi -m 1024 -k en-us -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian:01:f4b280ed945d -drive if=none,id=drive-ide2,media=cdrom,aio=threads -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -device virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/dev/zvol/rpool/srv/vmdata/vm-100-disk-1,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=native,detect-zeroes=unmap -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=<censored>,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300