Linux OOM-killer atuando apesar de muita memória disponível

3

Uma vez por semana, mais ou menos, o OOM-killer abate um processo de postgres no meu servidor, apesar de o 'free' dizer que ele tem bastante memória disponível.

Eu li vários tópicos aqui e ali, mas não consigo ver nenhuma explicação real. É realmente porque o servidor agora troca? É um bug do kernel (Ubuntu)?

E preventivamente, sim, talvez eu adicione swap. Mas não há outra solução? Ou pelo menos explicação? :)

Server: Physical Dell
Memory: 64gb RAM and 0 Swap
uname: Linux server-name 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Postgres version: 9.5.10  (8gb shared memory)
vm.overcommit_memory = 0

Saída de free -m antes da última morte

              total        used        free      shared  buff/cache   available
Mem:          64312        2666         450      8699    61196        52126
Swap:             0           0           0

Registro de kernel do último kill

Jun 19 21:29:49 server-name kernel: [17009377.877956] bash invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Jun 19 21:29:49 server-name kernel: [17009377.877959] bash cpuset=/ mems_allowed=0-1
Jun 19 21:29:49 server-name kernel: [17009377.877964] CPU: 23 PID: 61771 Comm: bash Not tainted 4.4.0-62-generic #83-Ubuntu
Jun 19 21:29:49 server-name kernel: [17009377.877966] Hardware name: Dell Inc. PowerEdge M630/0R10KJ, BIOS 2.4.2 01/09/2017
Jun 19 21:29:49 server-name kernel: [17009377.877967]  0000000000000286 00000000d566bdbf ffff88001369baf0 ffffffff813f7c63
Jun 19 21:29:49 server-name kernel: [17009377.877969]  ffff88001369bcc8 ffff88010d85b800 ffff88001369bb60 ffffffff8120ad4e
Jun 19 21:29:49 server-name kernel: [17009377.877971]  0000000000000000 0000000000000700 ffffffff81e42a40 ffff8810547850c0
Jun 19 21:29:49 server-name kernel: [17009377.877973] Call Trace:
Jun 19 21:29:49 server-name kernel: [17009377.877979]  [] dump_stack+0x63/0x90
Jun 19 21:29:49 server-name kernel: [17009377.877984]  [] dump_header+0x5a/0x1c5
Jun 19 21:29:49 server-name kernel: [17009377.877988]  [] oom_kill_process+0x202/0x3c0
Jun 19 21:29:49 server-name kernel: [17009377.877990]  [] ? oom_unkillable_task+0x9e/0xd0
Jun 19 21:29:49 server-name kernel: [17009377.877992]  [] out_of_memory+0x219/0x460
Jun 19 21:29:49 server-name kernel: [17009377.877995]  [] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
Jun 19 21:29:49 server-name kernel: [17009377.877997]  [] __alloc_pages_nodemask+0x286/0x2a0
Jun 19 21:29:49 server-name kernel: [17009377.877999]  [] alloc_kmem_pages_node+0x4b/0xc0
Jun 19 21:29:49 server-name kernel: [17009377.878003]  [] copy_process+0x1be/0x1b70
Jun 19 21:29:49 server-name kernel: [17009377.878007]  [] ? handle_mm_fault+0xce0/0x1820
Jun 19 21:29:49 server-name kernel: [17009377.878010]  [] ? sched_clock+0x9/0x10
Jun 19 21:29:49 server-name kernel: [17009377.878015]  [] ? sched_clock_cpu+0x8f/0xa0
Jun 19 21:29:49 server-name kernel: [17009377.878017]  [] _do_fork+0x80/0x360
Jun 19 21:29:49 server-name kernel: [17009377.878021]  [] ? sigprocmask+0x6f/0xa0
Jun 19 21:29:49 server-name kernel: [17009377.878023]  [] SyS_clone+0x19/0x20
Jun 19 21:29:49 server-name kernel: [17009377.878027]  [] entry_SYSCALL_64_fastpath+0x16/0x71
Jun 19 21:29:49 server-name kernel: [17009377.878028] Mem-Info:
Jun 19 21:29:49 server-name kernel: [17009377.878034] active_anon:2161218 inactive_anon:328736 isolated_anon:0
Jun 19 21:29:49 server-name kernel: [17009377.878034]  active_file:9390648 inactive_file:3525717 isolated_file:0
Jun 19 21:29:49 server-name kernel: [17009377.878034]  unevictable:923 dirty:3206 writeback:0 unstable:0
Jun 19 21:29:49 server-name kernel: [17009377.878034]  slab_reclaimable:427991 slab_unreclaimable:85432
Jun 19 21:29:49 server-name kernel: [17009377.878034]  mapped:2177419 shmem:2227151 pagetables:345413 bounce:0
Jun 19 21:29:49 server-name kernel: [17009377.878034]  free:122878 free_pcp:1 free_cma:0
Jun 19 21:29:49 server-name kernel: [17009377.878037] Node 0 DMA free:14488kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jun 19 21:29:49 server-name kernel: [17009377.878041] lowmem_reserve[]: 0 1820 32015 32015 32015
Jun 19 21:29:49 server-name kernel: [17009377.878044] Node 0 DMA32 free:123340kB min:2552kB low:3188kB high:3828kB active_anon:1066728kB inactive_anon:123988kB active_file:8kB inactive_file:8kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1985352kB managed:1904732kB mlocked:0kB dirty:0kB writeback:0kB mapped:736056kB shmem:744112kB slab_reclaimable:289824kB slab_unreclaimable:200192kB kernel_stack:1552kB pagetables:84288kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jun 19 21:29:49 server-name kernel: [17009377.878048] lowmem_reserve[]: 0 0 30195 30195 30195
Jun 19 21:29:49 server-name kernel: [17009377.878062] Node 0 Normal free:154400kB min:42332kB low:52912kB high:63496kB active_anon:6913860kB inactive_anon:1054596kB active_file:10248116kB inactive_file:10244456kB unevictable:72kB isolated(anon):0kB isolated(file):0kB present:31457280kB managed:30919988kB mlocked:72kB dirty:1364kB writeback:0kB mapped:7294996kB shmem:7449576kB slab_reclaimable:826280kB slab_unreclaimable:67216kB kernel_stack:6016kB pagetables:1250072kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jun 19 21:29:49 server-name kernel: [17009377.878065] lowmem_reserve[]: 0 0 0 0 0
Jun 19 21:29:49 server-name kernel: [17009377.878067] Node 1 Normal free:199284kB min:45200kB low:56500kB high:67800kB active_anon:664284kB inactive_anon:136360kB active_file:27314468kB inactive_file:3858404kB unevictable:3620kB isolated(anon):0kB isolated(file):0kB present:33554432kB managed:33015880kB mlocked:3620kB dirty:11460kB writeback:0kB mapped:678624kB shmem:714916kB slab_reclaimable:595860kB slab_unreclaimable:74320kB kernel_stack:4608kB pagetables:47292kB unstable:0kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jun 19 21:29:49 server-name kernel: [17009377.878084] lowmem_reserve[]: 0 0 0 0 0
Jun 19 21:29:49 server-name kernel: [17009377.878086] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 0*64kB 1*128kB (U) 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 3*4096kB (M) = 14488kB
Jun 19 21:29:49 server-name kernel: [17009377.878092] Node 0 DMA32: 12409*4kB (UME) 4093*8kB (UME) 839*16kB (UME) 111*32kB (UME) 68*64kB (UM) 33*128kB (UME) 24*256kB (UME) 14*512kB (UME) 2*1024kB (M) 0*2048kB 0*4096kB = 123292kB
Jun 19 21:29:49 server-name kernel: [17009377.878109] Node 0 Normal: 39107*4kB (UME) 179*8kB (UME) 0*16kB 1*32kB (H) 1*64kB (H) 1*128kB (H) 1*256kB (H) 1*512kB (H) 1*1024kB (H) 0*2048kB 0*4096kB = 159876kB
Jun 19 21:29:49 server-name kernel: [17009377.878115] Node 1 Normal: 50883*4kB (UME) 133*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 204596kB
Jun 19 21:29:49 server-name kernel: [17009377.878120] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jun 19 21:29:49 server-name kernel: [17009377.878121] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 19 21:29:49 server-name kernel: [17009377.878121] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jun 19 21:29:49 server-name kernel: [17009377.878122] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 19 21:29:49 server-name kernel: [17009377.878123] 15144218 total pagecache pages
Jun 19 21:29:49 server-name kernel: [17009377.878124] 0 pages in swap cache
Jun 19 21:29:49 server-name kernel: [17009377.878125] Swap cache stats: add 0, delete 0, find 0/0
Jun 19 21:29:49 server-name kernel: [17009377.878126] Free swap  = 0kB
Jun 19 21:29:49 server-name kernel: [17009377.878126] Total swap = 0kB
Jun 19 21:29:49 server-name kernel: [17009377.878127] 16753261 pages RAM
Jun 19 21:29:49 server-name kernel: [17009377.878127] 0 pages HighMem/MovableOnly
Jun 19 21:29:49 server-name kernel: [17009377.878128] 289137 pages reserved
Jun 19 21:29:49 server-name kernel: [17009377.878129] 0 pages cma reserved
Jun 19 21:29:49 server-name kernel: [17009377.878129] 0 pages hwpoisoned
Jun 19 21:29:49 server-name kernel: [17009377.878130] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jun 19 21:29:49 server-name kernel: [17009377.878136] [ 1181]     0  1181    21212    12109      47       3        0             0 systemd-journal
Jun 19 21:29:49 server-name kernel: [17009377.878138] [ 1212]     0  1212    23694      319      16       3        0             0 lvmetad
Jun 19 21:29:49 server-name kernel: [17009377.878139] [ 1226]     0  1226    11507     1164      24       3        0         -1000 systemd-udevd
Jun 19 21:29:49 server-name kernel: [17009377.878141] [ 1776]     0  1776     6512      523      18       3        0             0 atd
Jun 19 21:29:49 server-name kernel: [17009377.878142] [ 1778]   107  1778    10758      949      25       3        0          -900 dbus-daemon
Jun 19 21:29:49 server-name kernel: [17009377.878143] [ 1790]   104  1790    64100      955      27       3        0             0 rsyslogd
Jun 19 21:29:49 server-name kernel: [17009377.878144] [ 1794]     0  1794     7708      591      20       3        0             0 cron
Jun 19 21:29:49 server-name kernel: [17009377.878146] [ 1798]     0  1798     7138      656      18       3        0             0 systemd-logind
Jun 19 21:29:49 server-name kernel: [17009377.878147] [ 1800]     0  1800    77434     1220      20       3        0             0 lxcfs
Jun 19 21:29:49 server-name kernel: [17009377.878148] [ 1805]     0  1805    69421     1395      38       4        0             0 accounts-daemon
Jun 19 21:29:49 server-name kernel: [17009377.878150] [ 1807]     0  1807   385362     6819      84       6        0          -900 snapd
Jun 19 21:29:49 server-name kernel: [17009377.878151] [ 1809]     0  1809     1101      173       9       3        0             0 acpid
Jun 19 21:29:49 server-name kernel: [17009377.878152] [ 1835]     0  1835     3345       42      11       3        0             0 mdadm
Jun 19 21:29:49 server-name kernel: [17009377.878154] [ 1852]     0  1852    69296     1880      37       4        0             0 polkitd
Jun 19 21:29:49 server-name kernel: [17009377.878155] [ 1959]     0  1959    16381     1507      36       3        0         -1000 sshd
Jun 19 21:29:49 server-name kernel: [17009377.878156] [ 1972]     0  1972     1307      412       8       3        0             0 iscsid
Jun 19 21:29:49 server-name kernel: [17009377.878158] [ 1973]     0  1973     1432      917       8       3        0           -17 iscsid
Jun 19 21:29:49 server-name kernel: [17009377.878159] [ 2036]     0  2036     4441      383      13       3        0             0 agetty
Jun 19 21:29:49 server-name kernel: [17009377.878160] [ 2095]     0  2095     4934      597      15       3        0             0 irqbalance
Jun 19 21:29:49 server-name kernel: [17009377.878162] [ 2138]   111  2138    27509     1256      25       3        0             0 ntpd
Jun 19 21:29:49 server-name kernel: [17009377.878163] [ 2323]   112  2323    13971      727      30       3        0             0 exim4
Jun 19 21:29:49 server-name kernel: [17009377.878164] [ 2329]     0  2329    73510     4000      43       4        0             0 fail2ban-server
Jun 19 21:29:49 server-name kernel: [17009377.878166] [ 7103]   113  7103  2203146    66729     188       4        0          -900 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878167] [101917]     0 101917    13563      470      27       3        0             0 keepalived
Jun 19 21:29:49 server-name kernel: [17009377.878169] [101918]     0 101918    14093     1204      33       3        0             0 keepalived
Jun 19 21:29:49 server-name kernel: [17009377.878170] [101919]     0 101919    14093      800      32       3        0             0 keepalived
Jun 19 21:29:49 server-name kernel: [17009377.878172] [126772]   115 126772     5994      664      16       4        0             0 nrpe
Jun 19 21:29:49 server-name kernel: [17009377.878174] [70979]   113 70979  2203419  2135243    4232      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878175] [70980]   113 70980  2203211   282046    3748      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878176] [70981]   113 70981  2203146     5331      69       4        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878178] [70982]   113 70982  2203399     1773      71       4        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878179] [70983]   113 70983    37911     1097      55       4        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878180] [70984]   113 70984  2203919   115562    1754      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878182] [70985]   113 70985  2204540    68113    1213      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878183] [70986]   113 70986  2205899   471030    3891      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878184] [70992]   113 70992  2204243   111679    1550      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878185] [70993]   113 70993  2203484     2784      75       4        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878187] [70994]   113 70994  2205941   541014    3966      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878188] [70995]   113 70995  2206035   408079    3095      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878189] [70996]   113 70996  2203934   160075    2604      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878190] [70997]   113 70997  2203936   218125    2911      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878192] [70998]   113 70998  2204811  1327751    4263      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878193] [70999]   113 70999  2206100  2081582    4267      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878194] [71000]   113 71000  2204024  1694269    4251      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878196] [71001]   113 71001  2209678  2127573    4274      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878197] [71002]   113 71002  2204028  1683854    4251      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878198] [71003]   113 71003  2209601  2118203    4273      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878199] [71004]   113 71004  2203982   955099    4247      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878201] [71005]   113 71005  2204924  1348990    4262      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878202] [71006]   113 71006  2203995  1255468    4247      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878203] [71014]   113 71014  2204016  1562410    4251      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878204] [71015]   113 71015  2204199    70592    1039      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878206] [71016]   113 71016  2209670  2063214    4273      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878207] [71023]   113 71023  2206079   537513    3839      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878208] [71024]   113 71024  2203922   125526    1820      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878210] [71025]   113 71025  2203943   230822    3084      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878211] [71027]   113 71027  2206498   625052    4028      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878212] [71029]   113 71029  2204012  1614770    4249      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878214] [71030]   113 71030  2209593  2083374    4272      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878215] [71033]   113 71033  2203940   178025    2673      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878216] [71034]   113 71034  2206090   426476    3624      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878218] [71057]   113 71057  2204867  2144196    4265      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878219] [71058]   113 71058  2204546   224493    2893      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878220] [71113]   113 71113  2209581  2127791    4272      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878222] [71276]   113 71276  2209713  2125684    4274      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878223] [71315]   113 71315  2203984   678258    4234      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878224] [71316]   113 71316  2209663  2137633    4273      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878225] [71425]   113 71425  2203985  1229779    4250      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878226] [71426]   113 71426  2207773  2089808    4271      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878228] [71479]   113 71479  2209624  2137703    4273      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878229] [71566]   113 71566  2205224   109789    1497      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878230] [71634]   113 71634  2204084    42530     640      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878232] [71636]   113 71636  2204166    36964     547      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878233] [71637]   113 71637  2203758     8574     167      10        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878234] [71861]   113 71861  2204034  1659821    4249      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878235] [71878]   113 71878  2204101    27948     404      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878237] [73036]   113 73036  2204208    23556     315      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878238] [73043]   113 73043  2204332    15593     234      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878239] [73167]   113 73167  2209593  2124044    4274      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878240] [73168]   113 73168  2204014  1505503    4251      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878242] [73426]   113 73426  2206039   247425    2686      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878243] [73714]   113 73714  2204379    66347     562       9        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878244] [73735]   113 73735  2207590  2128748    4270      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878246] [74062]   113 74062  2209849  2101538    4274      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878247] [74196]   113 74196  2204283    34071     526      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878248] [74203]   113 74203  2204922   150413    1889      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878249] [74581]   113 74581  2209626  2125857    4273      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878250] [74817]   113 74817  2204033  1637797    4250      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878252] [75281]   113 75281  2204276    56982     690      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878253] [75282]   113 75282  2205141   106605    1299      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878254] [75283]   113 75283  2203982    27531     348      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878256] [75309]   113 75309  2205662   128795    1652      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878257] [76231]   113 76231  2204009    53847     803      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878258] [77386]   113 77386  2203883    40653     602      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878260] [77917]   113 77917  2204815   846041    4248      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878261] [77925]   113 77925  2203999  1394845    4251      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878262] [77957]   113 77957  2204961  1375124    4264      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878263] [77958]   113 77958  2203998  1645874    4250      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878264] [77994]   113 77994  2209598  2029427    4273      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878265] [78004]   113 78004  2204876   990859    4261      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878266] [78009]   113 78009  2209693  2074139    4274      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878268] [78010]   113 78010  2204012  1528597    4248      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878269] [78011]   113 78011  2209708  2130929    4274      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878270] [78012]   113 78012  2209906  2116419    4274      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878271] [78013]   113 78013  2203951   568349    4225      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878272] [78021]   113 78021  2203819    14483     210      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878273] [78028]   113 78028  2209930  2138334    4275      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878274] [78076]   113 78076  2204009  1648542    4248      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878275] [78077]   113 78077  2204008  1622033    4250      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878276] [78231]   113 78231  2209778  2125564    4273      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878278] [78232]   113 78232  2204006  1467730    4251      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878279] [78748]   113 78748  2209554  2091379    4272      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878281] [79120]   113 79120  2207656  2129657    4270      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878282] [79121]   113 79121  2209649  2136786    4274      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878283] [79122]   113 79122  2203949   342314    3972      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878285] [80062]   113 80062  2204008  1257889    4249      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878286] [81048]   113 81048  2205151   130415    1862      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878287] [81050]   113 81050  2203941    43779     627      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878289] [84879]   113 84879  2205000  1285857    4263      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878290] [85403]   113 85403  2204492    74870    1073      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878291] [85404]   113 85404  2204962   112681    1425      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878292] [89649]   113 89649  2204322    56650     734      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878294] [90729]   113 90729  2204495    95699    1334      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878295] [90732]   113 90732  2203804     9328     184      10        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878296] [90755]   113 90755  2204363    81196    1100      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878298] [92006]   113 92006  2204032  1592146    4248      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878299] [95662]   113 95662  2203779    11467     223      11        0             0 postgres
Jun 19 21:29:49 server-name kernel: [17009377.878300] [100918]   113 100918  2204778   843529    4246      11        0             0 postgres
...
Jun 19 21:29:49 server-name kernel: [17009377.878360] [59692]     0 59692     3610      841      12       3        0             0 bash
Jun 19 21:29:49 server-name kernel: [17009377.878361] [61771]     0 61771     3610       83      11       3        0             0 bash
Jun 19 21:29:49 server-name kernel: [17009377.878362] Out of memory: Kill process 71057 (postgres) score 130 or sacrifice child
Jun 19 21:29:49 server-name kernel: [17009377.878616] Killed process 71057 (postgres) total-vm:8819468kB, anon-rss:5948kB, file-rss:8570836kB
    
por Alexander Kolodziej 23.06.2018 / 23:01

2 respostas

8

O problema é explicado aqui:

Jun 19 21:29:49 server-name kernel: [17009377.877956] bash invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0

E aqui:

Jun 19 21:29:49 server-name kernel: [17009377.878115] Node 1 Normal: 50883*4kB (UME) 133*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 204596kB

O kernel está solicitando memória da ordem 2 (isso é 16kB ou mais), os sinalizadores GFP indicam que ele DEVE chegar de 'thisnode'. Eu imagino que, dada a falta de ordem 2 ou maior memória do Nó 1, é o Nó 1 que ele tenta.

Os sinalizadores também indicam que ele pode executar a troca para liberar memória para que isso funcione. Embora não haja troca.

Eu não sei ao certo, mas eu imagino que ter uma pequena quantidade de swap iria consertar isso (como seria trocar para satisfazer o pedido) e eu também suspeito que permitiria a compactação da memória (isso reordena a memória física para Aumente a memória para pedidos mais altos), o que evitaria a necessidade de troca tão frequente - note que isto é apenas um palpite.

    
por 23.06.2018 / 23:40
1

Você pode atualizar para um novo kernel. Eu pensei que há uma mudança que melhora a compactação em torno da alocação do sistema de arquivos, mas eu não tenho o commit à mão. Parece o Ubuntu, então você pode experimentar o Bionic.

Para mais detalhes sobre a fragmentação em zonas, veja a resposta de Matthew de alguns anos atrás: Linux oom situation . Você pode ajustar vm.extfrag_threshold ou disparar manualmente em uma emergência com /proc/sys/vm/compact_memory

    
por 24.06.2018 / 21:38