Isso foi causado por um bug do kernel presente nos kernels Linux 4.7.0 a 4.7.4 (ele foi corrigido por este commit em 4.7.5 e este commit em 4.8.0).
Eu tenho um servidor baseado em ARM com pouco menos de 2 GB de memória endereçável e 4 GB de troca ativada:
root@bang:~> free -m
total used free shared buff/cache available
Mem: 1976 388 48 15 1539 1487
Swap: 4095 1 4094
Uma vez que o sistema está ativo há mais ou menos um dia, o killer da OOM começa a ficar agressivo e começa a matar coisas:
Aug 3 12:59:01 bang kernel: [51585.822794] dump1090 invoked oom-killer: gfp_mask=0x24040c0(GFP_KERNEL|__GFP_COMP), order=2, oom_score_adj=0
Aug 3 12:59:01 bang kernel: [51585.822851] dump1090 cpuset=/ mems_allowed=0
Aug 3 12:59:01 bang kernel: [51585.822963] CPU: 6 PID: 25989 Comm: dump1090 Tainted: G C 4.7.0-41238-g206dbde-dirty #16
Aug 3 12:59:01 bang kernel: [51585.823010] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Aug 3 12:59:01 bang kernel: [51585.823120] [<c010e4ec>] (unwind_backtrace) from [<c010b234>] (show_stack+0x10/0x14)
Aug 3 12:59:01 bang kernel: [51585.823203] [<c010b234>] (show_stack) from [<c04eff84>] (dump_stack+0x88/0x9c)
Aug 3 12:59:01 bang kernel: [51585.823283] [<c04eff84>] (dump_stack) from [<c0227830>] (dump_header+0x5c/0x1b0)
Aug 3 12:59:01 bang kernel: [51585.823357] [<c0227830>] (dump_header) from [<c01d1aec>] (oom_kill_process+0x328/0x494)
Aug 3 12:59:01 bang kernel: [51585.823420] [<c01d1aec>] (oom_kill_process) from [<c01d1fa0>] (out_of_memory+0x2e0/0x338)
Aug 3 12:59:01 bang kernel: [51585.823487] [<c01d1fa0>] (out_of_memory) from [<c01d6724>] (__alloc_pages_nodemask+0xd80/0xda0)
Aug 3 12:59:01 bang kernel: [51585.823555] [<c01d6724>] (__alloc_pages_nodemask) from [<c01d6a28>] (alloc_kmem_pages+0x18/0xb0)
Aug 3 12:59:01 bang kernel: [51585.823620] [<c01d6a28>] (alloc_kmem_pages) from [<c01ee7a4>] (kmalloc_order+0x10/0x20)
Aug 3 12:59:01 bang kernel: [51585.823688] [<c01ee7a4>] (kmalloc_order) from [<c06435b4>] (proc_submiturb+0x60c/0xe88)
Aug 3 12:59:01 bang kernel: [51585.823749] [<c06435b4>] (proc_submiturb) from [<c06446e4>] (usbdev_do_ioctl+0x8b4/0x1bfc)
Aug 3 12:59:01 bang kernel: [51585.823816] [<c06446e4>] (usbdev_do_ioctl) from [<c023c74c>] (do_vfs_ioctl+0x98/0x8e4)
Aug 3 12:59:01 bang kernel: [51585.823879] [<c023c74c>] (do_vfs_ioctl) from [<c023d004>] (SyS_ioctl+0x6c/0x7c)
Aug 3 12:59:01 bang kernel: [51585.823948] [<c023d004>] (SyS_ioctl) from [<c0107740>] (ret_fast_syscall+0x0/0x3c)
Aug 3 12:59:01 bang kernel: [51585.823987] Mem-Info:
Aug 3 12:59:01 bang kernel: [51585.824073] active_anon:43846 inactive_anon:46454 isolated_anon:0
Aug 3 12:59:01 bang kernel: [51585.824073] active_file:132799 inactive_file:109909 isolated_file:19
Aug 3 12:59:01 bang kernel: [51585.824073] unevictable:1408 dirty:56 writeback:0 unstable:0
Aug 3 12:59:01 bang kernel: [51585.824073] slab_reclaimable:17104 slab_unreclaimable:6387
Aug 3 12:59:01 bang kernel: [51585.824073] mapped:13368 shmem:3582 pagetables:971 bounce:0
Aug 3 12:59:01 bang kernel: [51585.824073] free:92967 free_pcp:31 free_cma:32601
Aug 3 12:59:01 bang kernel: [51585.824216] Normal free:13240kB min:3420kB low:4272kB high:5124kB active_anon:26652kB inactive_anon:26692kB active_file:360240kB inactive_file:194904kB unevictable:1336kB isolated(anon):0kB isolated(file):76kB present:770048kB managed:736192kB mlocked:1336kB dirty:16kB writeback:0kB mapped:11600kB shmem:900kB slab_reclaimable:68416kB slab_unreclaimable:25548kB kernel_stack:3384kB pagetables:3884kB unstable:0kB bounce:0kB free_pcp:124kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Aug 3 12:59:01 bang kernel: [51585.824259] lowmem_reserve[]: 0 9040 9040
Aug 3 12:59:01 bang kernel: [51585.824442] HighMem free:358664kB min:512kB low:1864kB high:3216kB active_anon:148732kB inactive_anon:159124kB active_file:170956kB inactive_file:244732kB unevictable:4296kB isolated(anon):0kB isolated(file):0kB present:1288192kB managed:1288192kB mlocked:4296kB dirty:208kB writeback:0kB mapped:41872kB shmem:13428kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:130404kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Aug 3 12:59:01 bang kernel: [51585.824483] lowmem_reserve[]: 0 0 0
Aug 3 12:59:01 bang kernel: [51585.824592] Normal: 1300*4kB (UMEH) 525*8kB (UMEH) 11*16kB (H) 9*32kB (H) 8*64kB (H) 5*128kB (H) 3*256kB (H) 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 13320kB
Aug 3 12:59:01 bang kernel: [51585.825061] HighMem: 1212*4kB (UMC) 538*8kB (UM) 160*16kB (UM) 140*32kB (UMC) 108*64kB (UMC) 34*128kB (UM) 19*256kB (UMC) 10*512kB (UM) 8*1024kB (UMC) 7*2048kB (UMC) 73*4096kB (UMC) = 358976kB
Aug 3 12:59:01 bang kernel: [51585.825558] 247387 total pagecache pages
Aug 3 12:59:01 bang kernel: [51585.825596] 18 pages in swap cache
Aug 3 12:59:01 bang kernel: [51585.825636] Swap cache stats: add 1360, delete 1342, find 33/71
Aug 3 12:59:01 bang kernel: [51585.825672] Free swap = 4190368kB
Aug 3 12:59:01 bang kernel: [51585.825705] Total swap = 4194300kB
Aug 3 12:59:01 bang kernel: [51585.825739] 514560 pages RAM
Aug 3 12:59:01 bang kernel: [51585.825772] 322048 pages HighMem/MovableOnly
Aug 3 12:59:01 bang kernel: [51585.825804] 8464 pages reserved
Aug 3 12:59:01 bang kernel: [51585.825836] 32768 pages cma reserved
Aug 3 12:59:01 bang kernel: [51585.825869] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Aug 3 12:59:01 bang kernel: [51585.825958] [ 2363] 0 2363 2724 664 8 0 13 -1000 systemd-udevd
Aug 3 12:59:01 bang kernel: [51585.826019] [ 4035] 0 4035 1736 445 7 0 16 0 syslog-ng
Aug 3 12:59:01 bang kernel: [51585.826073] [ 4036] 0 4036 11306 1067 15 0 38 0 syslog-ng
Aug 3 12:59:01 bang kernel: [51585.826123] [ 4037] 0 4037 1149 639 7 0 0 0 log_to_sql.sh
Aug 3 12:59:01 bang kernel: [51585.826173] [ 4235] 60 4235 57365 13082 62 0 881 0 mysqld
Aug 3 12:59:01 bang kernel: [51585.826222] [ 4283] 107 4283 2557 1006 9 0 0 0 ulogd
Aug 3 12:59:01 bang kernel: [51585.826268] [ 4698] 0 4698 899 404 5 0 0 0 pppd
Aug 3 12:59:01 bang kernel: [51585.826316] [ 4762] 105 4762 1183 472 6 0 0 0 dnsmasq
Aug 3 12:59:01 bang kernel: [51585.826363] [ 4970] 0 4970 1292 542 7 0 0 -1000 sshd
Aug 3 12:59:01 bang kernel: [51585.826410] [ 5079] 0 5079 32467 4668 25 0 0 0 apache2
Aug 3 12:59:01 bang kernel: [51585.826457] [ 5081] 81 5081 168576 28259 140 0 0 0 apache2
Aug 3 12:59:01 bang kernel: [51585.826504] [ 5082] 81 5082 173465 34888 154 0 0 0 apache2
Aug 3 12:59:01 bang kernel: [51585.826550] [ 5211] 0 5211 594 29 5 0 0 0 atd
Aug 3 12:59:01 bang kernel: [51585.826597] [ 5239] 102 5239 777 430 5 0 0 0 dbus-daemon
Aug 3 12:59:01 bang kernel: [51585.826644] [ 5299] 103 5299 2665 2156 11 0 0 0 dhcpd
Aug 3 12:59:01 bang kernel: [51585.826691] [ 5365] 240 5365 601 209 5 0 0 0 distccd
Aug 3 12:59:01 bang kernel: [51585.826738] [ 5366] 240 5366 601 25 4 0 0 0 distccd
Aug 3 12:59:01 bang kernel: [51585.826784] [ 5399] 123 5399 1874 1411 10 0 0 0 ntpd
Aug 3 12:59:01 bang kernel: [51585.826830] [ 5428] 240 5428 601 25 4 0 0 0 distccd
Aug 3 12:59:01 bang kernel: [51585.826876] [ 5433] 0 5433 929 617 7 0 0 0 dovecot
Aug 3 12:59:01 bang kernel: [51585.826922] [ 5443] 97 5443 700 512 6 0 0 0 anvil
Aug 3 12:59:01 bang kernel: [51585.826968] [ 5444] 0 5444 733 561 5 0 0 0 log
Aug 3 12:59:01 bang kernel: [51585.827015] [ 5470] 8 5470 10720 1045 14 0 0 0 exim
Aug 3 12:59:01 bang kernel: [51585.827061] [ 5477] 240 5477 601 25 4 0 0 0 distccd
Aug 3 12:59:01 bang kernel: [51585.827107] [ 5497] 240 5497 601 25 4 0 0 0 distccd
Aug 3 12:59:01 bang kernel: [51585.827153] [ 5500] 0 5500 20882 2674 21 0 0 0 fail2ban-server
Aug 3 12:59:01 bang kernel: [51585.827199] [ 5502] 0 5502 1677 1007 7 0 0 0 screen
Aug 3 12:59:01 bang kernel: [51585.827246] [ 5503] 240 5503 601 25 4 0 0 0 distccd
Aug 3 12:59:01 bang kernel: [51585.827291] [ 5504] 0 5504 1295 804 8 0 0 0 bash
Aug 3 12:59:01 bang kernel: [51585.827339] [ 5505] 0 5505 1347 704 6 0 0 0 top
Aug 3 12:59:01 bang kernel: [51585.827385] [ 5506] 0 5506 842 102 5 0 0 0 tail
Aug 3 12:59:01 bang kernel: [51585.827431] [ 5507] 0 5507 842 100 6 0 0 0 tail
Aug 3 12:59:01 bang kernel: [51585.827477] [ 5510] 0 5510 1150 584 7 0 0 0 multitail.sh
Aug 3 12:59:01 bang kernel: [51585.827524] [ 5519] 0 5519 2466 1794 9 0 0 0 multitail
Aug 3 12:59:01 bang kernel: [51585.827572] [ 5526] 0 5526 941 651 6 0 0 0 gam_server
Aug 3 12:59:01 bang kernel: [51585.827618] [ 5527] 0 5527 842 108 6 0 0 0 tail
Aug 3 12:59:01 bang kernel: [51585.827664] [ 5528] 0 5528 842 105 5 0 0 0 tail
Aug 3 12:59:01 bang kernel: [51585.827710] [ 5529] 0 5529 842 100 5 0 0 0 tail
Aug 3 12:59:01 bang kernel: [51585.827756] [ 5530] 0 5530 842 355 6 0 0 0 tail
Aug 3 12:59:01 bang kernel: [51585.827802] [ 5531] 0 5531 843 386 6 0 0 0 tail
Aug 3 12:59:01 bang kernel: [51585.827848] [ 5532] 240 5532 601 25 4 0 0 0 distccd
Aug 3 12:59:01 bang kernel: [51585.827894] [ 5550] 240 5550 601 25 4 0 0 0 distccd
Aug 3 12:59:01 bang kernel: [51585.827940] [ 5622] 0 5622 615 442 5 0 0 0 rpcbind
Aug 3 12:59:01 bang kernel: [51585.827986] [ 5634] 240 5634 601 25 4 0 0 0 distccd
Aug 3 12:59:01 bang kernel: [51585.828032] [ 5652] 0 5652 787 572 5 0 0 0 rpc.statd
Aug 3 12:59:01 bang kernel: [51585.828078] [ 5707] 0 5707 789 46 5 0 0 0 rpc.idmapd
Aug 3 12:59:01 bang kernel: [51585.828124] [ 5733] 240 5733 601 25 4 0 0 0 distccd
Aug 3 12:59:01 bang kernel: [51585.828170] [ 5747] 0 5747 856 497 5 0 0 0 rpc.mountd
Aug 3 12:59:01 bang kernel: [51585.828220] [ 5804] 101 5804 562 367 6 0 0 0 radvd
Aug 3 12:59:01 bang kernel: [51585.828266] [ 5805] 0 5805 562 239 6 0 0 0 radvd
Aug 3 12:59:01 bang kernel: [51585.828313] [ 5839] 240 5839 601 25 4 0 0 0 distccd
Aug 3 12:59:01 bang kernel: [51585.828359] [ 5860] 0 5860 1150 618 5 0 0 0 heating.sh
Aug 3 12:59:01 bang kernel: [51585.828405] [ 5898] 0 5898 1007 451 6 0 0 0 agetty
Aug 3 12:59:01 bang kernel: [51585.828451] [ 5899] 0 5899 1007 436 7 0 0 0 agetty
Aug 3 12:59:01 bang kernel: [51585.828497] [ 5900] 0 5900 1007 419 6 0 0 0 agetty
Aug 3 12:59:01 bang kernel: [51585.828543] [ 5901] 0 5901 1007 435 5 0 0 0 agetty
Aug 3 12:59:01 bang kernel: [51585.828589] [ 5902] 0 5902 1007 436 6 0 0 0 agetty
Aug 3 12:59:01 bang kernel: [51585.828779] [ 5903] 0 5903 1007 449 7 0 0 0 agetty
Aug 3 12:59:01 bang kernel: [51585.828827] [ 5904] 0 5904 609 420 5 0 0 0 agetty
Aug 3 12:59:01 bang kernel: [51585.828875] [ 6004] 0 6004 1455 921 7 0 0 0 bluetoothd
Aug 3 12:59:01 bang kernel: [51585.828926] [ 6010] 0 6010 39540 7714 43 0 0 0 python2
Aug 3 12:59:01 bang kernel: [51585.828974] [ 3224] 0 3224 2247 1027 10 0 0 0 sshd
Aug 3 12:59:01 bang kernel: [51585.829021] [ 3227] 1000 3227 2247 945 9 0 0 0 sshd
Aug 3 12:59:01 bang kernel: [51585.829066] [ 3228] 1000 3228 1298 774 8 0 0 0 bash
Aug 3 12:59:01 bang kernel: [51585.829111] [ 3236] 1000 3236 1347 645 6 0 0 0 su
Aug 3 12:59:01 bang kernel: [51585.829155] [ 3238] 0 3238 1298 799 7 0 0 0 bash
Aug 3 12:59:01 bang kernel: [51585.829202] [ 880] 0 880 1082 759 7 0 0 0 config
Aug 3 12:59:01 bang kernel: [51585.829247] [ 1099] 106 1099 1327 1093 7 0 0 0 imap-login
Aug 3 12:59:01 bang kernel: [51585.829334] [ 1111] 8 1111 1046 872 6 0 0 0 imap
Aug 3 12:59:01 bang kernel: [51585.829449] [10717] 0 10717 1299 765 7 0 0 0 bash
Aug 3 12:59:01 bang kernel: [51585.829564] [10784] 0 10784 2885 1232 9 0 0 0 mysql
Aug 3 12:59:01 bang kernel: [51585.829701] [16321] 40 16321 32298 9969 39 0 0 0 named
Aug 3 12:59:01 bang kernel: [51585.829900] [24379] 0 24379 996 411 6 0 0 0 cron
Aug 3 12:59:01 bang kernel: [51585.830042] [25814] 0 25814 2270 1056 10 0 0 0 sshd
Aug 3 12:59:01 bang kernel: [51585.830162] [25818] 1000 25818 2304 943 8 0 0 0 sshd
Aug 3 12:59:01 bang kernel: [51585.830290] [25819] 1000 25819 1298 769 6 0 0 0 bash
Aug 3 12:59:01 bang kernel: [51585.830405] [25827] 1000 25827 1347 642 7 0 0 0 su
Aug 3 12:59:01 bang kernel: [51585.830505] [25828] 0 25828 1298 760 8 0 0 0 bash
Aug 3 12:59:01 bang kernel: [51585.830620] [25834] 0 25834 1242 565 7 0 0 0 screen
Aug 3 12:59:01 bang kernel: [51585.830753] [12903] 0 12903 1299 788 7 0 0 0 bash
Aug 3 12:59:01 bang kernel: [51585.830872] [25975] 0 25975 6895 579 11 0 0 0 dump1090
Aug 3 12:59:01 bang kernel: [51585.831006] Out of memory: Kill process 5082 (apache2) score 22 or sacrifice child
Aug 3 12:59:01 bang kernel: [51585.832683] Killed process 5082 (apache2) total-vm:693860kB, anon-rss:118856kB, file-rss:13300kB, shmem-rss:7396kB
O problema é que o swap dificilmente está sendo usado. Por que nada foi trocado em vez de invocar o assassino da OOM?
Veja os detalhes da VM:
root@bang:~> grep '' /proc/sys/vm/*
/proc/sys/vm/admin_reserve_kbytes:8192
/proc/sys/vm/block_dump:0
grep: /proc/sys/vm/compact_memory: Permission denied
/proc/sys/vm/compact_unevictable_allowed:1
/proc/sys/vm/dirty_background_bytes:0
/proc/sys/vm/dirty_background_ratio:10
/proc/sys/vm/dirty_bytes:0
/proc/sys/vm/dirty_expire_centisecs:3000
/proc/sys/vm/dirty_ratio:20
/proc/sys/vm/dirtytime_expire_seconds:43200
/proc/sys/vm/dirty_writeback_centisecs:500
/proc/sys/vm/drop_caches:0
/proc/sys/vm/extfrag_threshold:500
/proc/sys/vm/highmem_is_dirtyable:0
/proc/sys/vm/laptop_mode:0
/proc/sys/vm/legacy_va_layout:0
/proc/sys/vm/lowmem_reserve_ratio:32 32
/proc/sys/vm/max_map_count:65530
/proc/sys/vm/min_free_kbytes:3420
/proc/sys/vm/mmap_min_addr:4096
/proc/sys/vm/mmap_rnd_bits:8
/proc/sys/vm/nr_pdflush_threads:0
/proc/sys/vm/oom_dump_tasks:1
/proc/sys/vm/oom_kill_allocating_task:0
/proc/sys/vm/overcommit_kbytes:0
/proc/sys/vm/overcommit_memory:0
/proc/sys/vm/overcommit_ratio:50
/proc/sys/vm/page-cluster:3
/proc/sys/vm/panic_on_oom:0
/proc/sys/vm/percpu_pagelist_fraction:0
/proc/sys/vm/stat_interval:1
/proc/sys/vm/swappiness:50
/proc/sys/vm/user_reserve_kbytes:62869
/proc/sys/vm/vfs_cache_pressure:100
/proc/sys/vm/watermark_scale_factor:10
O kernel é a linha principal 4.7 com alguns patches do Exynos:
Linux bang 4.7.0-41238-g206dbde-dirty #16 SMP PREEMPT Tue Aug 2 22:35:38 BST 2016 armv7l SAMSUNG EXYNOS (Flattened Device Tree) GNU/Linux
Agora, já que criei o kernel sozinho, é bem possível que eu tenha uma opção errada em algum lugar. Qualquer ajuda seria apreciada.
[EDIT1]: isso parece acontecer quando há alto uso de E / S, mas não determinei se isso está relacionado ao preenchimento do cache ou a qualquer outra coisa.
[EDIT2]: Parece que há uma discussão (neste momento) sobre o listas de discussão do kernel sobre o que parece ser um problema idêntico. Vou monitorá-lo e relatar o resultado.
Isso foi causado por um bug do kernel presente nos kernels Linux 4.7.0 a 4.7.4 (ele foi corrigido por este commit em 4.7.5 e este commit em 4.8.0).
Estou recebendo este erro com o kernel do FC25 4.8.8 Mesmo uso como acima, apache e dovecot. De acordo com o link , é uma fragmentação do filecache, e há uma solução alternativa para limpar isso de forma regular base via cron até a correção entrar em 4.9:
sync & & echo 1 > / proc / sys / vm / drop_caches