Linux OOM: Tentando entender as estatísticas de memória / incompatibilidade com os valores esperados

1

Estou tentando entender por que o killer da OOM colheu um processo em uma máquina virtual KVM com 1 gigabyte de memória. Embora eu já saiba que rodar máquinas Linux sem swap não é a melhor idéia, estou tentando entender por que o processo de coleta aconteceu.

Primeiro de tudo, aqui está o log do kernel contendo todas as informações que recebi do killer da OOM:

Jan  4 10:04:40 <host> kernel: [53542.948247] tor invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0,     oom_score_adj=0
Jan  4 10:04:40 <host> kernel: [53542.948247] tor cpuset=/ mems_allowed=0
Jan  4 10:04:40 <host> kernel: [53542.948251] CPU: 0 PID: 1439 Comm: tor Not tainted 4.9.0-4-amd64 #1 Debian 4.9.65-3+deb9u1
Jan  4 10:04:40 <host> kernel: [53542.948251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/    01/2014
Jan  4 10:04:40 <host> kernel: [53542.948252]  0000000000000000 ffffffffb2129134 ffffc4b4801afc28 ffffa0d89a244100
Jan  4 10:04:40 <host> kernel: [53542.948254]  ffffffffb1ffe340 0000000000000000 0000000000000000 0000000c024213ca
Jan  4 10:04:40 <host> kernel: [53542.948255]  ffffffffb1f84517 000000429a244100 ffffffffc0615695 0000000000000001
Jan  4 10:04:40 <host> kernel: [53542.948257] Call Trace:
Jan  4 10:04:40 <host> kernel: [53542.948261]  [<ffffffffb2129134>] ? dump_stack+0x5c/0x78
Jan  4 10:04:40 <host> kernel: [53542.948263]  [<ffffffffb1ffe340>] ? dump_header+0x78/0x1fd
Jan  4 10:04:40 <host> kernel: [53542.948264]  [<ffffffffb1f84517>] ? get_page_from_freelist+0x3f7/0xb40
Jan  4 10:04:40 <host> kernel: [53542.948268]  [<ffffffffc0615695>] ? virtballoon_oom_notify+0x25/0x70 [virtio_balloon]
Jan  4 10:04:40 <host> kernel: [53542.948269]  [<ffffffffb1f804aa>] ? oom_kill_process+0x21a/0x3e0
Jan  4 10:04:40 <host> kernel: [53542.948271]  [<ffffffffb1f8012d>] ? oom_badness+0xed/0x170
Jan  4 10:04:40 <host> kernel: [53542.948272]  [<ffffffffb1f80941>] ? out_of_memory+0x111/0x470
Jan  4 10:04:40 <host> kernel: [53542.948273]  [<ffffffffb1f85bb7>] ? __alloc_pages_slowpath+0xbb7/0xbf0
Jan  4 10:04:40 <host> kernel: [53542.948274]  [<ffffffffb1f85dee>] ? __alloc_pages_nodemask+0x1fe/0x260
Jan  4 10:04:40 <host> kernel: [53542.948277]  [<ffffffffb1fd6151>] ? alloc_pages_current+0x91/0x140
Jan  4 10:04:40 <host> kernel: [53542.948278]  [<ffffffffb1f7e940>] ? filemap_fault+0x300/0x5a0
Jan  4 10:04:40 <host> kernel: [53542.948290]  [<ffffffffc05677e1>] ? ext4_filemap_fault+0x31/0x50 [ext4]
Jan  4 10:04:40 <host> kernel: [53542.948293]  [<ffffffffb1faef81>] ? __do_fault+0x81/0x170
Jan  4 10:04:40 <host> kernel: [53542.948294]  [<ffffffffb1fb36c0>] ? handle_mm_fault+0xd80/0x12d0
Jan  4 10:04:40 <host> kernel: [53542.948296]  [<ffffffffb2001827>] ? new_sync_read+0xd7/0x120
Jan  4 10:04:40 <host> kernel: [53542.948297]  [<ffffffffb1e5ed9c>] ? __do_page_fault+0x25c/0x500
Jan  4 10:04:40 <host> kernel: [53542.948299]  [<ffffffffb2408a48>] ? async_page_fault+0x28/0x30
Jan  4 10:04:40 <host> kernel: [53542.948299] Mem-Info:
Jan  4 10:04:40 <host> kernel: [53542.948301] active_anon:125187 inactive_anon:2602 isolated_anon:0
Jan  4 10:04:40 <host> kernel: [53542.948301]  active_file:95 inactive_file:1644 isolated_file:0
Jan  4 10:04:40 <host> kernel: [53542.948301]  unevictable:0 dirty:0 writeback:8 unstable:0
Jan  4 10:04:40 <host> kernel: [53542.948301]  slab_reclaimable:4702 slab_unreclaimable:11348
Jan  4 10:04:40 <host> kernel: [53542.948301]  mapped:1458 shmem:2652 pagetables:804 bounce:0
Jan  4 10:04:40 <host> kernel: [53542.948301]  free:12103 free_pcp:89 free_cma:0
Jan  4 10:04:40 <host> kernel: [53542.948303] Node 0 active_anon:500748kB inactive_anon:10408kB active_file:380kB inactive_file:6576kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:5832kB     dirty:0kB writeback:32kB shmem:10608kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB pages_scanned:374512 all_unreclaimable?     yes
Jan  4 10:04:40 <host> kernel: [53542.948303] Node 0 DMA free:4124kB min:716kB low:892kB high:1068kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB     present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:1096kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB     free_cma:0kB
Jan  4 10:04:40 <host> kernel: [53542.948305] lowmem_reserve[]: 0 959 959 959 959
Jan  4 10:04:40 <host> kernel: [53542.948307] Node 0 DMA32 free:44288kB min:44336kB low:55420kB high:66504kB active_anon:500748kB inactive_anon:10408kB active_file:380kB inactive_file:6576kB unevictable:0kB     writepending:32kB present:1032048kB managed:1004452kB mlocked:0kB slab_reclaimable:18808kB slab_unreclaimable:44296kB kernel_stack:1232kB pagetables:3216kB bounce:0kB free_pcp:356kB local_pcp:356kB     free_cma:0kB
Jan  4 10:04:40 <host> kernel: [53542.948309] lowmem_reserve[]: 0 0 0 0 0
Jan  4 10:04:40 <host> kernel: [53542.948310] Node 0 DMA: 217*4kB (U) 171*8kB (U) 98*16kB (U) 0*32kB 1*64kB (U) 0*128kB 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB =     4124kB
Jan  4 10:04:40 <host> kernel: [53542.948315] Node 0 DMA32: 2150*4kB (UMEH) 2083*8kB (UMEH) 1189*16kB (UMEH) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =     44288kB
Jan  4 10:04:40 <host> kernel: [53542.948320] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan  4 10:04:40 <host> kernel: [53542.948321] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan  4 10:04:40 <host> kernel: [53542.948321] 4401 total pagecache pages
Jan  4 10:04:40 <host> kernel: [53542.948322] 0 pages in swap cache
Jan  4 10:04:40 <host> kernel: [53542.948323] Swap cache stats: add 0, delete 0, find 0/0
Jan  4 10:04:40 <host> kernel: [53542.948323] Free swap  = 0kB
Jan  4 10:04:40 <host> kernel: [53542.948323] Total swap = 0kB
Jan  4 10:04:40 <host> kernel: [53542.948324] 262010 pages RAM
Jan  4 10:04:40 <host> kernel: [53542.948324] 0 pages HighMem/MovableOnly
Jan  4 10:04:40 <host> kernel: [53542.948324] 6920 pages reserved
Jan  4 10:04:40 <host> kernel: [53542.948324] 0 pages hwpoisoned
Jan  4 10:04:40 <host> kernel: [53542.948325] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jan  4 10:04:40 <host> kernel: [53542.948328] [  437]     0   437    14212      372      24       3        0             0     systemd-journal
Jan  4 10:04:40 <host> kernel: [53542.948329] [  467]     0   467    11493      313      24       3        0         -1000 systemd-udevd
Jan  4 10:04:40 <host> kernel: [53542.948330] [  598]     0   598     3005      782      11       3        0             0 haveged
Jan  4 10:04:40 <host> kernel: [53542.948331] [  603]     0   603    37702    11634      80       3        0             0 salt-minion
Jan  4 10:04:40 <host> kernel: [53542.948332] [  605]     0   605    62529      306      27       3        0             0 rsyslogd
Jan  4 10:04:40 <host> kernel: [53542.948333] [  609]     0   609    11618      141      27       3        0             0     systemd-logind
Jan  4 10:04:40 <host> kernel: [53542.948334] [  612]   105   612    11283      117      26       3        0          -900 dbus-daemon
Jan  4 10:04:40 <host> kernel: [53542.948335] [  620]     0   620     7483       62      19       3        0             0 cron
Jan  4 10:04:40 <host> kernel: [53542.948336] [  621]     0   621    17486      189      37       3        0         -1000 sshd
Jan  4 10:04:40 <host> kernel: [53542.948337] [  633]     0   633     3708       34      12       3        0             0 agetty
Jan  4 10:04:40 <host> kernel: [53542.948338] [  716]     0   716   130502    15306     119       4        0             0 salt-minion
Jan  4 10:04:40 <host> kernel: [53542.948339] [  734]   107   734    25516      143      24       3        0             0 ntpd
Jan  4 10:04:40 <host> kernel: [53542.948340] [  738]     0   738    66842    12779      96       3        0             0 salt-minion
Jan  4 10:04:40 <host> kernel: [53542.948341] [ 1439]   109  1439   139276    93739     245       3        0             0 tor
Jan  4 10:04:40 <host> kernel: [53542.948342] Out of memory: Kill process 1439 (tor) score 368 or sacrifice child
Jan  4 10:04:40 <host> kernel: [53542.948364] Killed process 1439 (tor) total-vm:557104kB, anon-rss:374956kB, file-rss:0kB, shmem-rss:0kB
Jan  4 10:04:40 <host> kernel: [53542.966776] oom_reaper: reaped process 1439 (tor), now anon-rss:0kB, file-rss:0kB,     shmem-rss:0kB

Agora tentei encontrar o culpado convertendo a quantidade de páginas para o uso de memória em MiB (x 4/1024) para todas as estatísticas de memória diferentes:

MemInfo Name            Pages   Memory in MiB
active_anon             125187  489.0117188
inactive_anon           2602    10.1640625
active_file             95      0.37109375
inactive_file           1644    6.421875
writeback               8       0.03125
slam_reclaimable        4702    18.3671875
slab_unreclaimable      11348   44.328125
mapped                  1458    5.6953125
shmem                   2652    10.359375
pagetables              804     3.140625
pagecache               4401    17.19140625

Mas, mesmo que eu adicione todos esses números juntos, acabarei com um uso de cerca de 600 MiB. Quando apenas usando a soma de todos os valores "rss" da tabela de processos impressa pelo OOM killer, eu acabo com apenas 135917 páginas (~ 530 MiB).

Agora, se subtrairmos esse número calculado da quantidade total de memória (por exemplo, 996MiB - 600MiB no pior dos casos com base nos números acima), ainda teremos mais de 300MiB de memória teoricamente utilizável, no entanto dentro do kernel log "free: 44288kB" afirma que apenas ~ 44MiB de memória são deixados.

Por que há uma diferença e o que eu perdi? Existem outras alocações de memória que precisam ser levadas em conta e que estão usando esse espaço?

EDIT 1: Eu quero mencionar que o hypervisor em si tem 32GB de memória e só roda aquela única VM no momento. Existem mais de 20 GB de memória disponíveis de acordo com meminfo / free / top e os outros poucos GB são usados para armazenamento em cache.

EDIT 2: Conforme a solicitação, listarei os módulos do kernel ativos no sistema hipervisor que está executando o Proxmox com o armazenamento do ZFS:

acpi_pad, aes_x86_64, aesni_intel, ahci, ast, autofs4, binfmt_misc, coretemp, crc32_pclmul, crct10dif_pclmul, cryptd, crypto_simd, dca, drm, drm_kms_helper, e1000e, fb_sys_fops, ghash_clmulni_intel, glue_helper, gre, hid, hid_generic, i2c_algo_bit, i2c_i801, ib_cm, ib_core, ib_iser, icp, ie31200_edac, igb, inet_diag, input_leds, intel_cstate, intel_pch_thermal, intel_powerclamp, intel_rapl, intel_rapl_perf, ip6_tables, ip6table_filter, ip_gre, ip_set, ip_tables, ip_tunnel, ipmi_devintf, ipmi_msghandler, ipmi_si, ipmi_ssif, iptable_filter, irqbypass, iscsi_tcp, iw_cm, joydev, kvm, kvm_intel, libahci, libcrc32c, libiscsi, libiscsi_tcp, lpc_ich, mac_hid, mei, mei_me, nf_conntrack, nf_conntrack_ipv4, nf_conntrack_ipv6, nf_conntrack_proto_gre, nf_defrag_ipv4, nf_defrag_ipv6, nf_nat, nf_nat_ipv4, nf_nat_ipv6, nfnetlink, nfnetlink_log, nls_iso8859_1, openvswitch, pcbc, pcspkr, pps_core, ptp, rdma_cm, scsi_transport_iscsi, shpchp, softdog, spl, sunrpc, syscopyarea, sysfillrect, sysimgblt, tap, tcp_diag, ttm, usbhid, usbkbd, usbmouse, vhost, vhost_net, video, x86_pkg_temp_thermal, x_tables, zavl, zcommon, zfs, znvpair, zunicode

Além disso, estes são os módulos do kernel carregados na VM afetada:

ablk_helper, aes_x86_64, aesni_intel, ata_generic, ata_piix, autofs4, binfmt_misc, bochs_drm, button, cdrom, crc16, crc32_pclmul, crc32c_generic, crc32c_intel, crct10dif_pclmul, cryptd, drm, drm_kms_helper, ecb, ehci_hcd, evdev, ext4, floppy, fscrypto, gf128mul, ghash_clmulni_intel, glue_helper, hid, hid_generic, i2c_piix4, ip6_tables, ip6t_REJECT, ip6table_filter, ip_set, ip_set_hash_net, ip_set_list_set, ip_tables, ipmi_msghandler, ipmi_watchdog, ipt_REJECT, iptable_filter, jbd2, joydev, libata, lrw, mbcache, nf_conntrack, nf_conntrack_ipv4, nf_conntrack_ipv6, nf_defrag_ipv4, nf_defrag_ipv6, nf_log_common, nf_log_ipv4, nf_log_ipv6, nf_reject_ipv4, nf_reject_ipv6, nfnetlink, parport, parport_pc, pcspkr, ppdev, psmouse, scsi_mod, sd_mod, serio_raw, sg, shpchp, sr_mod, ttm, uhci_hcd, usb_common, usbcore, usbhid, virtio, virtio_balloon, virtio_net, virtio_pci, virtio_ring, virtio_scsi, x_tables, xt_addrtype, xt_comment, xt_conntrack, xt_limit, xt_LOG, xt_multiport, xt_recent, xt_tcpudp

Por último, mas não menos importante, a linha de comando que está sendo usada para iniciar a máquina virtual:

/usr/bin/kvm -id 100 -chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -pidfile /var/run/qemu-server/100.pid -daemonize -smbios type=1,uuid=<censored> -name <censored> -smp 1,sockets=1,cores=1,maxcpus=1 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vga std -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu host,+kvm_pv_unhalt,+kvm_pv_eoi -m 1024 -k en-us -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian:01:f4b280ed945d -drive if=none,id=drive-ide2,media=cdrom,aio=threads -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -device virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/dev/zvol/rpool/srv/vmdata/vm-100-disk-1,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=native,detect-zeroes=unmap -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=<censored>,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
    
por Pascal Mathis 04.01.2018 / 22:35

1 resposta

0

Ainda não posso comentar, por isso vou tentar criar algo como uma resposta. Primeiro, veja estas fontes:

A zona de memória que acionou o killer da OOM foi DMA32:

Node 0 DMA32 free:44288kB min:44336kB low:55420kB high:66504kB

Pelo que entendi e seu log mostra que é possível, a memória do seu servidor está fragmentada:

Node 0 DMA32: 2150*4kB (UMEH) 2083*8kB (UMEH) 1189*16kB (UMEH) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =     44288kB

A maior página disponível é de 16kB.

Você pode dar uma olhada na sua atual fragmentação de memória:

cat /proc/buddyinfo
cat /proc/pagetypeinfo

Aqui é uma explicação de ambos os arquivos e seu conteúdo.

    
por 05.01.2018 / 20:21