Parece que estou tendo um problema estranho com os processos de eliminação de oom sem nenhum motivo. Esta é uma máquina Ubuntu 16.04 que está atualizada, com kernel 4.4.0-62-genérico e rodando 3 VMs e BackupPC com 16GB de RAM (a máquina é uma dell t20). As VMs usam 256 MB, 2 GB e 3 GB de RAM. O Ubuntu é basicamente configurado com configurações padrão. As principais mudanças após a instalação padrão foram a instalação do qemu e backuppc afaik.
[ 0.000000] Memory: 16298836K/16683092K available (8436K kernel code, 1291K rwdata, 3960K rodata, 1488K init, 1316K bss, 384256K reserved, 0K cma-reserved)
A informação de lançamento:
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial
As configurações de supercomprometimento são padrão como follos:
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
Agora, o sistema está matando os processos da VM. Eu não entendo isso porque normalmente OOM mata o processo usando a maioria das memórias.
[241816.503021] Killed process 3198 (qemu-system-x86) total-vm:4181796kB, anon-rss:3324684kB, file-rss:3588kB
O processo estava apenas usando 4GB vm e 3GB rss. Além disso, a máquina não estava nem trocando!
[241816.502934] Free swap = 7953124kB
[241816.502935] Total swap = 8293372kB
Você pode dizer porque o oom está matando processos? o que estou perdendo? Porque parece que a máquina está usando um total inferior a 7 GB de RAM a partir de 16 GB instalados
O registro completo está abaixo:
[241816.502856] cron invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[241816.502859] cron cpuset=/ mems_allowed=0
[241816.502862] CPU: 0 PID: 1035 Comm: cron Not tainted 4.4.0-62-generic #83-Ubuntu
[241816.502863] Hardware name: Dell Inc. PowerEdge T20/0VD5HY, BIOS A06 01/27/2015
[241816.502864] 0000000000000286 00000000bf9ec188 ffff8800da123af0 ffffffff813f7c63
[241816.502866] ffff8800da123cc8 ffff880405b5d400 ffff8800da123b60 ffffffff8120ad4e
[241816.502868] 0000000000000015 0000000000000000 ffff880409ac2540 ffff880407bad400
[241816.502869] Call Trace:
[241816.502873] [<ffffffff813f7c63>] dump_stack+0x63/0x90
[241816.502876] [<ffffffff8120ad4e>] dump_header+0x5a/0x1c5
[241816.502878] [<ffffffff81390c14>] ? apparmor_capable+0xc4/0x1b0
[241816.502881] [<ffffffff811926c2>] oom_kill_process+0x202/0x3c0
[241816.502882] [<ffffffff81192ae9>] out_of_memory+0x219/0x460
[241816.502884] [<ffffffff81198a5d>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
[241816.502886] [<ffffffff81198e56>] __alloc_pages_nodemask+0x286/0x2a0
[241816.502887] [<ffffffff81198f0b>] alloc_kmem_pages_node+0x4b/0xc0
[241816.502890] [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
[241816.502891] [<ffffffff81213d73>] ? cp_new_stat+0x153/0x180
[241816.502893] [<ffffffff810805a0>] _do_fork+0x80/0x360
[241816.502894] [<ffffffff81080929>] SyS_clone+0x19/0x20
[241816.502897] [<ffffffff818385f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[241816.502898] Mem-Info:
[241816.502900] active_anon:1077377 inactive_anon:526767 isolated_anon:0
active_file:832229 inactive_file:670439 isolated_file:0
unevictable:914 dirty:0 writeback:0 unstable:0
slab_reclaimable:870324 slab_unreclaimable:29718
mapped:5481 shmem:5279 pagetables:5271 bounce:0
free:46803 free_pcp:0 free_cma:0
[241816.502902] Node 0 DMA free:15852kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15936kB managed:15852kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[241816.502905] lowmem_reserve[]: 0 3376 15901 15901 15901
[241816.502907] Node 0 DMA32 free:85128kB min:14336kB low:17920kB high:21504kB active_anon:633984kB inactive_anon:650080kB active_file:994428kB inactive_file:726700kB unevictable:56kB isolated(anon):0kB isolated(file):0kB present:3578388kB managed:3497768kB mlocked:56kB dirty:0kB writeback:0kB mapped:9292kB shmem:12084kB slab_reclaimable:366052kB slab_unreclaimable:21320kB kernel_stack:1584kB pagetables:4008kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[241816.502909] lowmem_reserve[]: 0 0 12524 12524 12524
[241816.502911] Node 0 Normal free:86232kB min:53180kB low:66472kB high:79768kB active_anon:3675524kB inactive_anon:1456988kB active_file:2334488kB inactive_file:1955056kB unevictable:3600kB isolated(anon):0kB isolated(file):0kB present:13088768kB managed:12825312kB mlocked:3600kB dirty:0kB writeback:0kB mapped:12632kB shmem:9032kB slab_reclaimable:3115244kB slab_unreclaimable:97552kB kernel_stack:2640kB pagetables:17076kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[241816.502913] lowmem_reserve[]: 0 0 0 0 0
[241816.502915] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15852kB
[241816.502921] Node 0 DMA32: 15269*4kB (UME) 3028*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 85300kB
[241816.502925] Node 0 Normal: 21214*4kB (UMEH) 28*8kB (EH) 11*16kB (H) 11*32kB (H) 4*64kB (H) 3*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 86760kB
[241816.502931] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[241816.502931] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[241816.502932] 1532619 total pagecache pages
[241816.502933] 24066 pages in swap cache
[241816.502934] Swap cache stats: add 757347, delete 733281, find 479805/565341
[241816.502934] Free swap = 7953124kB
[241816.502935] Total swap = 8293372kB
[241816.502935] 4170773 pages RAM
[241816.502936] 0 pages HighMem/MovableOnly
[241816.502936] 86040 pages reserved
[241816.502937] 0 pages cma reserved
[241816.502937] 0 pages hwpoisoned
[241816.502938] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[241816.502941] [ 397] 0 397 8819 832 20 3 36 0 systemd-journal
[241816.502942] [ 435] 0 435 25742 229 17 3 0 0 lvmetad
[241816.502944] [ 454] 0 454 11440 574 23 3 488 -1000 systemd-udevd
[241816.502945] [ 1020] 0 1020 68967 1031 36 3 58 0 accounts-daemon
[241816.502947] [ 1022] 0 1022 1100 317 7 3 2 0 acpid
[241816.502948] [ 1029] 0 1029 6322 605 18 3 83 0 smartd
[241816.502949] [ 1031] 0 1031 7470 190 18 3 49 0 cgmanager
[241816.502950] [ 1035] 0 1035 7252 594 21 3 40 0 cron
[241816.502951] [ 1040] 0 1040 6511 477 18 3 35 0 atd
[241816.502952] [ 1042] 107 1042 10726 580 26 3 59 -900 dbus-daemon
[241816.502953] [ 1098] 0 1098 58693 333 17 3 5 0 lxcfs
[241816.502955] [ 1100] 0 1100 7159 461 18 3 60 0 systemd-logind
[241816.502956] [ 1102] 104 1102 64099 451 28 3 213 0 rsyslogd
[241816.502957] [ 1104] 0 1104 53932 1284 29 5 1500 0 snapd
[241816.502958] [ 1189] 0 1189 16380 764 37 4 143 -1000 sshd
[241816.502959] [ 1201] 0 1201 3344 24 11 3 13 0 mdadm
[241816.502960] [ 1208] 0 1208 1306 31 9 3 0 0 iscsid
[241816.502961] [ 1209] 0 1209 1431 878 9 3 0 -17 iscsid
[241816.502963] [ 1216] 0 1216 69278 914 39 4 596 0 polkitd
[241816.502964] [ 1263] 0 1263 365148 1616 170 4 2336 0 libvirtd
[241816.502965] [ 1293] 0 1293 3985 366 13 3 0 0 agetty
[241816.502966] [ 1298] 0 1298 4868 23 14 3 41 0 irqbalance
[241816.502967] [ 1310] 116 1310 27509 654 24 3 113 0 ntpd
[241816.502968] [ 1421] 115 1421 17416 1849 37 3 2222 0 BackupPC
[241816.502969] [ 1422] 115 1422 54531 34434 112 3 9739 0 BackupPC_trashC
[241816.502970] [ 1471] 0 1471 18941 896 40 3 237 0 apache2
[241816.502972] [ 1544] 0 1544 16352 501 24 3 96 0 master
[241816.502973] [ 1546] 114 1546 16881 469 25 3 98 0 qmgr
[241816.502974] [ 1722] 113 1722 12496 352 27 3 97 0 dnsmasq
[241816.502975] [ 1723] 0 1723 12489 1 27 3 93 0 dnsmasq
[241816.502976] [ 1800] 113 1800 12496 0 27 3 98 0 dnsmasq
[241816.502977] [ 1804] 0 1804 48439 806 52 3 13 -900 virtlogd
[241816.502978] [ 1904] 112 1904 472592 285000 721 5 7103 0 qemu-system-x86
[241816.502979] [ 1997] 112 1997 277724 85130 334 4 9316 0 qemu-system-x86
[241816.502981] [ 3198] 112 3198 1045449 832068 1880 7 14166 0 qemu-system-x86
[241816.502982] [29065] 33 29065 18941 603 39 3 243 0 apache2
[241816.502983] [29066] 33 29066 91246 692 69 3 738 0 apache2
[241816.502984] [29067] 33 29067 124032 1274 71 4 225 0 apache2
[241816.502985] [ 5735] 115 5735 295501 258925 578 4 17706 0 BackupPC_dump
[241816.502986] [ 5818] 115 5818 276492 238098 539 4 18790 0 BackupPC_dump
[241816.502988] [ 7774] 114 7774 16869 1111 24 3 0 0 pickup
[241816.502989] Out of memory: Kill process 3198 (qemu-system-x86) score 137 or sacrifice child
[241816.503021] Killed process 3198 (qemu-system-x86) total-vm:4181796kB, anon-rss:3324684kB, file-rss:3588kB
[241816.703137] virbr1: port 4(vnet2) entered disabled state
[241816.704366] device vnet2 left promiscuous mode
[241816.704367] virbr1: port 4(vnet2) entered disabled state
[241819.514670] audit: type=1400 audit(1487210104.861:50): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-c0ed3084-e7d5-4165-b125-8089914fe680" pid=8265 comm="apparmor_parser"
[247217.394936] libvirt-bin invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[247217.394938] libvirt-bin cpuset=/ mems_allowed=0
[247217.394943] CPU: 1 PID: 8920 Comm: libvirt-bin Not tainted 4.4.0-62-generic #83-Ubuntu
[247217.394944] Hardware name: Dell Inc. PowerEdge T20/0VD5HY, BIOS A06 01/27/2015
[247217.394945] 0000000000000286 00000000e1669350 ffff88017aaffaf0 ffffffff813f7c63
[247217.394947] ffff88017aaffcc8 ffff8800da16e200 ffff88017aaffb60 ffffffff8120ad4e
[247217.394948] 0000000000000015 0000000000000000 ffff880409ac2540 ffff880407bad400
[247217.394950] Call Trace:
[247217.394954] [<ffffffff813f7c63>] dump_stack+0x63/0x90
[247217.394957] [<ffffffff8120ad4e>] dump_header+0x5a/0x1c5
[247217.394960] [<ffffffff81390c14>] ? apparmor_capable+0xc4/0x1b0
[247217.394962] [<ffffffff811926c2>] oom_kill_process+0x202/0x3c0
[247217.394964] [<ffffffff8119208e>] ? oom_unkillable_task+0x9e/0xd0
[247217.394965] [<ffffffff81192ae9>] out_of_memory+0x219/0x460
[247217.394967] [<ffffffff81198a5d>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
[247217.394969] [<ffffffff81198e56>] __alloc_pages_nodemask+0x286/0x2a0
[247217.394971] [<ffffffff81198f0b>] alloc_kmem_pages_node+0x4b/0xc0
[247217.394974] [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
[247217.394976] [<ffffffff811c1660>] ? handle_mm_fault+0xce0/0x1820
[247217.394979] [<ffffffff81037eb9>] ? sched_clock+0x9/0x10
[247217.394982] [<ffffffff810b1bcf>] ? sched_clock_cpu+0x8f/0xa0
[247217.394984] [<ffffffff810805a0>] _do_fork+0x80/0x360
[247217.394985] [<ffffffff81080929>] SyS_clone+0x19/0x20
[247217.394988] [<ffffffff818385f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[247217.394989] Mem-Info:
[247217.394992] active_anon:495436 inactive_anon:332110 isolated_anon:0
active_file:1362581 inactive_file:834329 isolated_file:0
unevictable:914 dirty:5499 writeback:274 unstable:0
slab_reclaimable:959199 slab_unreclaimable:17954
mapped:6609 shmem:5247 pagetables:3469 bounce:0
free:58696 free_pcp:115 free_cma:0
[247217.394994] Node 0 DMA free:15852kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15936kB managed:15852kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[247217.394997] lowmem_reserve[]: 0 3376 15901 15901 15901
[247217.394999] Node 0 DMA32 free:91172kB min:14336kB low:17920kB high:21504kB active_anon:345184kB inactive_anon:361348kB active_file:1469732kB inactive_file:782520kB unevictable:56kB isolated(anon):0kB isolated(file):0kB present:3578388kB managed:3497768kB mlocked:56kB dirty:3892kB writeback:220kB mapped:11244kB shmem:12080kB slab_reclaimable:422984kB slab_unreclaimable:12256kB kernel_stack:1616kB pagetables:2184kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:216 all_unreclaimable? no
[247217.395002] lowmem_reserve[]: 0 0 12524 12524 12524
[247217.395004] Node 0 Normal free:127760kB min:53180kB low:66472kB high:79768kB active_anon:1636560kB inactive_anon:967092kB active_file:3980592kB inactive_file:2554796kB unevictable:3600kB isolated(anon):0kB isolated(file):0kB present:13088768kB managed:12825312kB mlocked:3600kB dirty:18104kB writeback:876kB mapped:15192kB shmem:8908kB slab_reclaimable:3413812kB slab_unreclaimable:59560kB kernel_stack:2592kB pagetables:11692kB unstable:0kB bounce:0kB free_pcp:460kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[247217.395006] lowmem_reserve[]: 0 0 0 0 0
[247217.395008] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15852kB
[247217.395014] Node 0 DMA32: 11405*4kB (UME) 5706*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 91268kB
[247217.395018] Node 0 Normal: 30930*4kB (UMEH) 264*8kB (UMEH) 5*16kB (H) 5*32kB (H) 4*64kB (H) 3*128kB (H) 2*256kB (H) 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 127736kB
[247217.395025] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[247217.395025] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[247217.395026] 2234245 total pagecache pages
[247217.395027] 31364 pages in swap cache
[247217.395028] Swap cache stats: add 769200, delete 737836, find 501629/589327
[247217.395029] Free swap = 7999552kB
[247217.395029] Total swap = 8293372kB
[247217.395030] 4170773 pages RAM
[247217.395030] 0 pages HighMem/MovableOnly
[247217.395031] 86040 pages reserved
[247217.395031] 0 pages cma reserved
[247217.395032] 0 pages hwpoisoned
[247217.395032] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[247217.395040] [ 397] 0 397 10970 2059 23 3 31 0 systemd-journal
[247217.395041] [ 435] 0 435 25742 229 17 3 0 0 lvmetad
[247217.395044] [ 454] 0 454 11440 823 23 3 396 -1000 systemd-udevd
[247217.395046] [ 1020] 0 1020 68967 1031 36 3 58 0 accounts-daemon
[247217.395047] [ 1022] 0 1022 1100 317 7 3 2 0 acpid
[247217.395048] [ 1029] 0 1029 6322 605 18 3 83 0 smartd
[247217.395050] [ 1031] 0 1031 7470 190 18 3 49 0 cgmanager
[247217.395051] [ 1035] 0 1035 7252 593 21 3 41 0 cron
[247217.395053] [ 1040] 0 1040 6511 477 18 3 35 0 atd
[247217.395054] [ 1042] 107 1042 10726 580 26 3 59 -900 dbus-daemon
[247217.395055] [ 1098] 0 1098 58693 333 17 3 5 0 lxcfs
[247217.395057] [ 1100] 0 1100 7159 461 18 3 60 0 systemd-logind
[247217.395058] [ 1102] 104 1102 64099 510 28 3 203 0 rsyslogd
[247217.395060] [ 1104] 0 1104 53932 1284 29 5 1500 0 snapd
[247217.395061] [ 1189] 0 1189 16380 764 37 4 143 -1000 sshd
[247217.395063] [ 1201] 0 1201 3344 24 11 3 13 0 mdadm
[247217.395064] [ 1208] 0 1208 1306 31 9 3 0 0 iscsid
[247217.395065] [ 1209] 0 1209 1431 878 9 3 0 -17 iscsid
[247217.395066] [ 1216] 0 1216 69278 914 39 4 596 0 polkitd
[247217.395068] [ 1263] 0 1263 365148 2482 170 4 2162 0 libvirtd
[247217.395069] [ 1293] 0 1293 3985 366 13 3 0 0 agetty
[247217.395070] [ 1298] 0 1298 4868 23 14 3 41 0 irqbalance
[247217.395072] [ 1310] 116 1310 27509 654 24 3 113 0 ntpd
[247217.395073] [ 1421] 115 1421 17416 1864 37 3 2207 0 BackupPC
[247217.395075] [ 1422] 115 1422 54531 34425 112 3 9748 0 BackupPC_trashC
[247217.395076] [ 1471] 0 1471 18941 896 40 3 237 0 apache2
[247217.395077] [ 1544] 0 1544 16352 504 24 3 93 0 master
[247217.395078] [ 1546] 114 1546 16881 469 25 3 98 0 qmgr
[247217.395080] [ 1722] 113 1722 12496 352 27 3 97 0 dnsmasq
[247217.395081] [ 1723] 0 1723 12489 1 27 3 93 0 dnsmasq
[247217.395082] [ 1800] 113 1800 12496 419 27 3 95 0 dnsmasq
[247217.395083] [ 1804] 0 1804 48439 815 52 3 11 -900 virtlogd
[247217.395085] [ 1904] 112 1904 472592 285001 721 5 7102 0 qemu-system-x86
[247217.395086] [ 1997] 112 1997 277724 85130 334 4 9316 0 qemu-system-x86
[247217.395088] [29065] 33 29065 18941 603 39 3 243 0 apache2
[247217.395090] [29066] 33 29066 91246 691 69 3 739 0 apache2
[247217.395091] [29067] 33 29067 124032 1274 71 4 225 0 apache2
[247217.395092] [ 5735] 115 5735 295501 269817 578 4 6814 0 BackupPC_dump
[247217.395094] [ 5818] 115 5818 276492 247915 539 4 9138 0 BackupPC_dump
[247217.395095] [ 8764] 114 8764 16869 1113 25 3 0 0 pickup
[247217.395097] [ 8867] 0 8867 12555 709 30 3 11 0 cron
[247217.395098] [ 8870] 0 8870 1127 189 8 3 0 0 sh
[247217.395099] [ 8871] 0 8871 1092 165 8 3 0 0 run-parts
[247217.395101] [ 8887] 0 8887 1127 441 8 3 0 0 libvirt-bin
[247217.395102] [ 8920] 0 8920 1127 27 8 3 0 0 libvirt-bin
[247217.395103] Out of memory: Kill process 1904 (qemu-system-x86) score 47 or sacrifice child
[247217.395137] Killed process 1904 (qemu-system-x86) total-vm:1890368kB, anon-rss:1136532kB, file-rss:3472kB
[247217.472809] virbr1: port 2(vnet0) entered disabled state
[247217.474014] device vnet0 left promiscuous mode
[247217.474015] virbr1: port 2(vnet0) entered disabled state
Eu também estou vendo as seguintes mensagens na inicialização. Não tenho certeza se eles estão relacionados.
[ 0.000000] mtrr_cleanup: can not find optimal value
[ 0.000000] please specify mtrr_gran_size/mtrr_chunk_size
Além disso, poucos erros de ECC de memória foram registrados no BIOS. Mas eles eram de meses atrás. Nós mudamos a máquina inteira para uma nova máquina de hardware do mesmo modelo agora. BIOS atualizado para a versão mais recente. Até agora, o uso da memória flutua em menos da metade da memória da máquina. Veremos em breve se a OOM mataria os processos novamente ou não. Geralmente levava uma semana ou mais ...
KiB Mem : 16338936 total, 173348 free, 6812676 used, 9352912 buff/cache
KiB Swap: 8293372 total, 7672968 free, 620404 used. 9059716 avail Mem
ATUALIZAÇÃO: A máquina está funcionando perfeitamente por enquanto! Portanto, o problema provavelmente estava relacionado aos erros de ECC que vi no sistema OU a atualização do BIOS o corrigiu. Não tenho 100% de certeza, porque a caixa inteira foi substituída por outra máquina do mesmo modelo e o BIOS foi atualizado. Até aí tudo bem!