Por que o killer da OOM está matando processos quando a troca é dificilmente usada?

4

Eu tenho um servidor baseado em ARM com pouco menos de 2 GB de memória endereçável e 4 GB de troca ativada:

root@bang:~> free -m
              total        used        free      shared  buff/cache   available
Mem:           1976         388          48          15        1539        1487
Swap:          4095           1        4094

Uma vez que o sistema está ativo há mais ou menos um dia, o killer da OOM começa a ficar agressivo e começa a matar coisas:

Aug  3 12:59:01 bang kernel: [51585.822794] dump1090 invoked oom-killer: gfp_mask=0x24040c0(GFP_KERNEL|__GFP_COMP), order=2, oom_score_adj=0
Aug  3 12:59:01 bang kernel: [51585.822851] dump1090 cpuset=/ mems_allowed=0
Aug  3 12:59:01 bang kernel: [51585.822963] CPU: 6 PID: 25989 Comm: dump1090 Tainted: G         C      4.7.0-41238-g206dbde-dirty #16
Aug  3 12:59:01 bang kernel: [51585.823010] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Aug  3 12:59:01 bang kernel: [51585.823120] [<c010e4ec>] (unwind_backtrace) from [<c010b234>] (show_stack+0x10/0x14)
Aug  3 12:59:01 bang kernel: [51585.823203] [<c010b234>] (show_stack) from [<c04eff84>] (dump_stack+0x88/0x9c)
Aug  3 12:59:01 bang kernel: [51585.823283] [<c04eff84>] (dump_stack) from [<c0227830>] (dump_header+0x5c/0x1b0)
Aug  3 12:59:01 bang kernel: [51585.823357] [<c0227830>] (dump_header) from [<c01d1aec>] (oom_kill_process+0x328/0x494)
Aug  3 12:59:01 bang kernel: [51585.823420] [<c01d1aec>] (oom_kill_process) from [<c01d1fa0>] (out_of_memory+0x2e0/0x338)
Aug  3 12:59:01 bang kernel: [51585.823487] [<c01d1fa0>] (out_of_memory) from [<c01d6724>] (__alloc_pages_nodemask+0xd80/0xda0)
Aug  3 12:59:01 bang kernel: [51585.823555] [<c01d6724>] (__alloc_pages_nodemask) from [<c01d6a28>] (alloc_kmem_pages+0x18/0xb0)
Aug  3 12:59:01 bang kernel: [51585.823620] [<c01d6a28>] (alloc_kmem_pages) from [<c01ee7a4>] (kmalloc_order+0x10/0x20)
Aug  3 12:59:01 bang kernel: [51585.823688] [<c01ee7a4>] (kmalloc_order) from [<c06435b4>] (proc_submiturb+0x60c/0xe88)
Aug  3 12:59:01 bang kernel: [51585.823749] [<c06435b4>] (proc_submiturb) from [<c06446e4>] (usbdev_do_ioctl+0x8b4/0x1bfc)
Aug  3 12:59:01 bang kernel: [51585.823816] [<c06446e4>] (usbdev_do_ioctl) from [<c023c74c>] (do_vfs_ioctl+0x98/0x8e4)
Aug  3 12:59:01 bang kernel: [51585.823879] [<c023c74c>] (do_vfs_ioctl) from [<c023d004>] (SyS_ioctl+0x6c/0x7c)
Aug  3 12:59:01 bang kernel: [51585.823948] [<c023d004>] (SyS_ioctl) from [<c0107740>] (ret_fast_syscall+0x0/0x3c)
Aug  3 12:59:01 bang kernel: [51585.823987] Mem-Info:
Aug  3 12:59:01 bang kernel: [51585.824073] active_anon:43846 inactive_anon:46454 isolated_anon:0
Aug  3 12:59:01 bang kernel: [51585.824073]  active_file:132799 inactive_file:109909 isolated_file:19
Aug  3 12:59:01 bang kernel: [51585.824073]  unevictable:1408 dirty:56 writeback:0 unstable:0
Aug  3 12:59:01 bang kernel: [51585.824073]  slab_reclaimable:17104 slab_unreclaimable:6387
Aug  3 12:59:01 bang kernel: [51585.824073]  mapped:13368 shmem:3582 pagetables:971 bounce:0
Aug  3 12:59:01 bang kernel: [51585.824073]  free:92967 free_pcp:31 free_cma:32601
Aug  3 12:59:01 bang kernel: [51585.824216] Normal free:13240kB min:3420kB low:4272kB high:5124kB active_anon:26652kB inactive_anon:26692kB active_file:360240kB inactive_file:194904kB unevictable:1336kB isolated(anon):0kB isolated(file):76kB present:770048kB managed:736192kB mlocked:1336kB dirty:16kB writeback:0kB mapped:11600kB shmem:900kB slab_reclaimable:68416kB slab_unreclaimable:25548kB kernel_stack:3384kB pagetables:3884kB unstable:0kB bounce:0kB free_pcp:124kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Aug  3 12:59:01 bang kernel: [51585.824259] lowmem_reserve[]: 0 9040 9040
Aug  3 12:59:01 bang kernel: [51585.824442] HighMem free:358664kB min:512kB low:1864kB high:3216kB active_anon:148732kB inactive_anon:159124kB active_file:170956kB inactive_file:244732kB unevictable:4296kB isolated(anon):0kB isolated(file):0kB present:1288192kB managed:1288192kB mlocked:4296kB dirty:208kB writeback:0kB mapped:41872kB shmem:13428kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:130404kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Aug  3 12:59:01 bang kernel: [51585.824483] lowmem_reserve[]: 0 0 0
Aug  3 12:59:01 bang kernel: [51585.824592] Normal: 1300*4kB (UMEH) 525*8kB (UMEH) 11*16kB (H) 9*32kB (H) 8*64kB (H) 5*128kB (H) 3*256kB (H) 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 13320kB
Aug  3 12:59:01 bang kernel: [51585.825061] HighMem: 1212*4kB (UMC) 538*8kB (UM) 160*16kB (UM) 140*32kB (UMC) 108*64kB (UMC) 34*128kB (UM) 19*256kB (UMC) 10*512kB (UM) 8*1024kB (UMC) 7*2048kB (UMC) 73*4096kB (UMC) = 358976kB
Aug  3 12:59:01 bang kernel: [51585.825558] 247387 total pagecache pages
Aug  3 12:59:01 bang kernel: [51585.825596] 18 pages in swap cache
Aug  3 12:59:01 bang kernel: [51585.825636] Swap cache stats: add 1360, delete 1342, find 33/71
Aug  3 12:59:01 bang kernel: [51585.825672] Free swap  = 4190368kB
Aug  3 12:59:01 bang kernel: [51585.825705] Total swap = 4194300kB
Aug  3 12:59:01 bang kernel: [51585.825739] 514560 pages RAM
Aug  3 12:59:01 bang kernel: [51585.825772] 322048 pages HighMem/MovableOnly
Aug  3 12:59:01 bang kernel: [51585.825804] 8464 pages reserved
Aug  3 12:59:01 bang kernel: [51585.825836] 32768 pages cma reserved
Aug  3 12:59:01 bang kernel: [51585.825869] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Aug  3 12:59:01 bang kernel: [51585.825958] [ 2363]     0  2363     2724      664       8       0       13         -1000 systemd-udevd
Aug  3 12:59:01 bang kernel: [51585.826019] [ 4035]     0  4035     1736      445       7       0       16             0 syslog-ng
Aug  3 12:59:01 bang kernel: [51585.826073] [ 4036]     0  4036    11306     1067      15       0       38             0 syslog-ng
Aug  3 12:59:01 bang kernel: [51585.826123] [ 4037]     0  4037     1149      639       7       0        0             0 log_to_sql.sh
Aug  3 12:59:01 bang kernel: [51585.826173] [ 4235]    60  4235    57365    13082      62       0      881             0 mysqld
Aug  3 12:59:01 bang kernel: [51585.826222] [ 4283]   107  4283     2557     1006       9       0        0             0 ulogd
Aug  3 12:59:01 bang kernel: [51585.826268] [ 4698]     0  4698      899      404       5       0        0             0 pppd
Aug  3 12:59:01 bang kernel: [51585.826316] [ 4762]   105  4762     1183      472       6       0        0             0 dnsmasq
Aug  3 12:59:01 bang kernel: [51585.826363] [ 4970]     0  4970     1292      542       7       0        0         -1000 sshd
Aug  3 12:59:01 bang kernel: [51585.826410] [ 5079]     0  5079    32467     4668      25       0        0             0 apache2
Aug  3 12:59:01 bang kernel: [51585.826457] [ 5081]    81  5081   168576    28259     140       0        0             0 apache2
Aug  3 12:59:01 bang kernel: [51585.826504] [ 5082]    81  5082   173465    34888     154       0        0             0 apache2
Aug  3 12:59:01 bang kernel: [51585.826550] [ 5211]     0  5211      594       29       5       0        0             0 atd
Aug  3 12:59:01 bang kernel: [51585.826597] [ 5239]   102  5239      777      430       5       0        0             0 dbus-daemon
Aug  3 12:59:01 bang kernel: [51585.826644] [ 5299]   103  5299     2665     2156      11       0        0             0 dhcpd
Aug  3 12:59:01 bang kernel: [51585.826691] [ 5365]   240  5365      601      209       5       0        0             0 distccd
Aug  3 12:59:01 bang kernel: [51585.826738] [ 5366]   240  5366      601       25       4       0        0             0 distccd
Aug  3 12:59:01 bang kernel: [51585.826784] [ 5399]   123  5399     1874     1411      10       0        0             0 ntpd
Aug  3 12:59:01 bang kernel: [51585.826830] [ 5428]   240  5428      601       25       4       0        0             0 distccd
Aug  3 12:59:01 bang kernel: [51585.826876] [ 5433]     0  5433      929      617       7       0        0             0 dovecot
Aug  3 12:59:01 bang kernel: [51585.826922] [ 5443]    97  5443      700      512       6       0        0             0 anvil
Aug  3 12:59:01 bang kernel: [51585.826968] [ 5444]     0  5444      733      561       5       0        0             0 log
Aug  3 12:59:01 bang kernel: [51585.827015] [ 5470]     8  5470    10720     1045      14       0        0             0 exim
Aug  3 12:59:01 bang kernel: [51585.827061] [ 5477]   240  5477      601       25       4       0        0             0 distccd
Aug  3 12:59:01 bang kernel: [51585.827107] [ 5497]   240  5497      601       25       4       0        0             0 distccd
Aug  3 12:59:01 bang kernel: [51585.827153] [ 5500]     0  5500    20882     2674      21       0        0             0 fail2ban-server
Aug  3 12:59:01 bang kernel: [51585.827199] [ 5502]     0  5502     1677     1007       7       0        0             0 screen
Aug  3 12:59:01 bang kernel: [51585.827246] [ 5503]   240  5503      601       25       4       0        0             0 distccd
Aug  3 12:59:01 bang kernel: [51585.827291] [ 5504]     0  5504     1295      804       8       0        0             0 bash
Aug  3 12:59:01 bang kernel: [51585.827339] [ 5505]     0  5505     1347      704       6       0        0             0 top
Aug  3 12:59:01 bang kernel: [51585.827385] [ 5506]     0  5506      842      102       5       0        0             0 tail
Aug  3 12:59:01 bang kernel: [51585.827431] [ 5507]     0  5507      842      100       6       0        0             0 tail
Aug  3 12:59:01 bang kernel: [51585.827477] [ 5510]     0  5510     1150      584       7       0        0             0 multitail.sh
Aug  3 12:59:01 bang kernel: [51585.827524] [ 5519]     0  5519     2466     1794       9       0        0             0 multitail
Aug  3 12:59:01 bang kernel: [51585.827572] [ 5526]     0  5526      941      651       6       0        0             0 gam_server
Aug  3 12:59:01 bang kernel: [51585.827618] [ 5527]     0  5527      842      108       6       0        0             0 tail
Aug  3 12:59:01 bang kernel: [51585.827664] [ 5528]     0  5528      842      105       5       0        0             0 tail
Aug  3 12:59:01 bang kernel: [51585.827710] [ 5529]     0  5529      842      100       5       0        0             0 tail
Aug  3 12:59:01 bang kernel: [51585.827756] [ 5530]     0  5530      842      355       6       0        0             0 tail
Aug  3 12:59:01 bang kernel: [51585.827802] [ 5531]     0  5531      843      386       6       0        0             0 tail
Aug  3 12:59:01 bang kernel: [51585.827848] [ 5532]   240  5532      601       25       4       0        0             0 distccd
Aug  3 12:59:01 bang kernel: [51585.827894] [ 5550]   240  5550      601       25       4       0        0             0 distccd
Aug  3 12:59:01 bang kernel: [51585.827940] [ 5622]     0  5622      615      442       5       0        0             0 rpcbind
Aug  3 12:59:01 bang kernel: [51585.827986] [ 5634]   240  5634      601       25       4       0        0             0 distccd
Aug  3 12:59:01 bang kernel: [51585.828032] [ 5652]     0  5652      787      572       5       0        0             0 rpc.statd
Aug  3 12:59:01 bang kernel: [51585.828078] [ 5707]     0  5707      789       46       5       0        0             0 rpc.idmapd
Aug  3 12:59:01 bang kernel: [51585.828124] [ 5733]   240  5733      601       25       4       0        0             0 distccd
Aug  3 12:59:01 bang kernel: [51585.828170] [ 5747]     0  5747      856      497       5       0        0             0 rpc.mountd
Aug  3 12:59:01 bang kernel: [51585.828220] [ 5804]   101  5804      562      367       6       0        0             0 radvd
Aug  3 12:59:01 bang kernel: [51585.828266] [ 5805]     0  5805      562      239       6       0        0             0 radvd
Aug  3 12:59:01 bang kernel: [51585.828313] [ 5839]   240  5839      601       25       4       0        0             0 distccd
Aug  3 12:59:01 bang kernel: [51585.828359] [ 5860]     0  5860     1150      618       5       0        0             0 heating.sh
Aug  3 12:59:01 bang kernel: [51585.828405] [ 5898]     0  5898     1007      451       6       0        0             0 agetty
Aug  3 12:59:01 bang kernel: [51585.828451] [ 5899]     0  5899     1007      436       7       0        0             0 agetty
Aug  3 12:59:01 bang kernel: [51585.828497] [ 5900]     0  5900     1007      419       6       0        0             0 agetty
Aug  3 12:59:01 bang kernel: [51585.828543] [ 5901]     0  5901     1007      435       5       0        0             0 agetty
Aug  3 12:59:01 bang kernel: [51585.828589] [ 5902]     0  5902     1007      436       6       0        0             0 agetty
Aug  3 12:59:01 bang kernel: [51585.828779] [ 5903]     0  5903     1007      449       7       0        0             0 agetty
Aug  3 12:59:01 bang kernel: [51585.828827] [ 5904]     0  5904      609      420       5       0        0             0 agetty
Aug  3 12:59:01 bang kernel: [51585.828875] [ 6004]     0  6004     1455      921       7       0        0             0 bluetoothd
Aug  3 12:59:01 bang kernel: [51585.828926] [ 6010]     0  6010    39540     7714      43       0        0             0 python2
Aug  3 12:59:01 bang kernel: [51585.828974] [ 3224]     0  3224     2247     1027      10       0        0             0 sshd
Aug  3 12:59:01 bang kernel: [51585.829021] [ 3227]  1000  3227     2247      945       9       0        0             0 sshd
Aug  3 12:59:01 bang kernel: [51585.829066] [ 3228]  1000  3228     1298      774       8       0        0             0 bash
Aug  3 12:59:01 bang kernel: [51585.829111] [ 3236]  1000  3236     1347      645       6       0        0             0 su
Aug  3 12:59:01 bang kernel: [51585.829155] [ 3238]     0  3238     1298      799       7       0        0             0 bash
Aug  3 12:59:01 bang kernel: [51585.829202] [  880]     0   880     1082      759       7       0        0             0 config
Aug  3 12:59:01 bang kernel: [51585.829247] [ 1099]   106  1099     1327     1093       7       0        0             0 imap-login
Aug  3 12:59:01 bang kernel: [51585.829334] [ 1111]     8  1111     1046      872       6       0        0             0 imap
Aug  3 12:59:01 bang kernel: [51585.829449] [10717]     0 10717     1299      765       7       0        0             0 bash
Aug  3 12:59:01 bang kernel: [51585.829564] [10784]     0 10784     2885     1232       9       0        0             0 mysql
Aug  3 12:59:01 bang kernel: [51585.829701] [16321]    40 16321    32298     9969      39       0        0             0 named
Aug  3 12:59:01 bang kernel: [51585.829900] [24379]     0 24379      996      411       6       0        0             0 cron
Aug  3 12:59:01 bang kernel: [51585.830042] [25814]     0 25814     2270     1056      10       0        0             0 sshd
Aug  3 12:59:01 bang kernel: [51585.830162] [25818]  1000 25818     2304      943       8       0        0             0 sshd
Aug  3 12:59:01 bang kernel: [51585.830290] [25819]  1000 25819     1298      769       6       0        0             0 bash
Aug  3 12:59:01 bang kernel: [51585.830405] [25827]  1000 25827     1347      642       7       0        0             0 su
Aug  3 12:59:01 bang kernel: [51585.830505] [25828]     0 25828     1298      760       8       0        0             0 bash
Aug  3 12:59:01 bang kernel: [51585.830620] [25834]     0 25834     1242      565       7       0        0             0 screen
Aug  3 12:59:01 bang kernel: [51585.830753] [12903]     0 12903     1299      788       7       0        0             0 bash
Aug  3 12:59:01 bang kernel: [51585.830872] [25975]     0 25975     6895      579      11       0        0             0 dump1090
Aug  3 12:59:01 bang kernel: [51585.831006] Out of memory: Kill process 5082 (apache2) score 22 or sacrifice child
Aug  3 12:59:01 bang kernel: [51585.832683] Killed process 5082 (apache2) total-vm:693860kB, anon-rss:118856kB, file-rss:13300kB, shmem-rss:7396kB

O problema é que o swap dificilmente está sendo usado. Por que nada foi trocado em vez de invocar o assassino da OOM?

Veja os detalhes da VM:

root@bang:~> grep ''     /proc/sys/vm/*
/proc/sys/vm/admin_reserve_kbytes:8192
/proc/sys/vm/block_dump:0
grep:     /proc/sys/vm/compact_memory: Permission denied
/proc/sys/vm/compact_unevictable_allowed:1
/proc/sys/vm/dirty_background_bytes:0
/proc/sys/vm/dirty_background_ratio:10
/proc/sys/vm/dirty_bytes:0
/proc/sys/vm/dirty_expire_centisecs:3000
/proc/sys/vm/dirty_ratio:20
/proc/sys/vm/dirtytime_expire_seconds:43200
/proc/sys/vm/dirty_writeback_centisecs:500
/proc/sys/vm/drop_caches:0
/proc/sys/vm/extfrag_threshold:500
/proc/sys/vm/highmem_is_dirtyable:0
/proc/sys/vm/laptop_mode:0
/proc/sys/vm/legacy_va_layout:0
/proc/sys/vm/lowmem_reserve_ratio:32    32
/proc/sys/vm/max_map_count:65530
/proc/sys/vm/min_free_kbytes:3420
/proc/sys/vm/mmap_min_addr:4096
/proc/sys/vm/mmap_rnd_bits:8
/proc/sys/vm/nr_pdflush_threads:0
/proc/sys/vm/oom_dump_tasks:1
/proc/sys/vm/oom_kill_allocating_task:0
/proc/sys/vm/overcommit_kbytes:0
/proc/sys/vm/overcommit_memory:0
/proc/sys/vm/overcommit_ratio:50
/proc/sys/vm/page-cluster:3
/proc/sys/vm/panic_on_oom:0
/proc/sys/vm/percpu_pagelist_fraction:0
/proc/sys/vm/stat_interval:1
/proc/sys/vm/swappiness:50
/proc/sys/vm/user_reserve_kbytes:62869
/proc/sys/vm/vfs_cache_pressure:100
/proc/sys/vm/watermark_scale_factor:10

O kernel é a linha principal 4.7 com alguns patches do Exynos:

Linux bang 4.7.0-41238-g206dbde-dirty #16 SMP PREEMPT Tue Aug 2 22:35:38 BST 2016 armv7l SAMSUNG EXYNOS (Flattened Device Tree) GNU/Linux

Agora, já que criei o kernel sozinho, é bem possível que eu tenha uma opção errada em algum lugar. Qualquer ajuda seria apreciada.

[EDIT1]: isso parece acontecer quando há alto uso de E / S, mas não determinei se isso está relacionado ao preenchimento do cache ou a qualquer outra coisa.

[EDIT2]: Parece que há uma discussão (neste momento) sobre o listas de discussão do kernel sobre o que parece ser um problema idêntico. Vou monitorá-lo e relatar o resultado.

    
por Steven Davies 03.08.2016 / 20:22

2 respostas

4

Isso foi causado por um bug do kernel presente nos kernels Linux 4.7.0 a 4.7.4 (ele foi corrigido por este commit em 4.7.5 e este commit em 4.8.0).

    
por 12.10.2016 / 19:17
1

Estou recebendo este erro com o kernel do FC25 4.8.8 Mesmo uso como acima, apache e dovecot. De acordo com o link , é uma fragmentação do filecache, e há uma solução alternativa para limpar isso de forma regular base via cron até a correção entrar em 4.9:

sync & & echo 1 > / proc / sys / vm / drop_caches

    
por 26.11.2016 / 13:34