kernel relatado conexão iSCSI 1: erro 0 (1022 - código de erro inválido ou desconhecido) estado (3) Fed 25 / VessRaid / rsnapshot

1

Nós rodamos rsnapshot (rsync) e usamos ionice & legal. No final do uso de memória de backup explode e os erros iSCSI abaixo começam a aparecer. Este é um Dell PowerEdge 1850 com o Fedora 25 com kernel 4.9, 8GB de RAM, 8GB de swap. Isso está conectado a um RAID de hardware chamado VessRAID 1840i. Eu vi a sugestão para desabilitar o dimensionamento da janela TCP .

Outro thread aparentemente relacionado tem sugestões para:

sysctl -w net.ipv4.tcp_tw_recycle=0

além de desativar os timestamps TCP:

sysctl -w net.ipv4.tcp_timestamps=0
To make this change permanent, make an entry in /etc/sysctl.conf.

Um erro semelhante, " iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)" , I encontrou uma sugestão para :

turn off iSCSI pings by setting:

node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0

And you can turn the replacement_timer to a very long value:

node.session.timeo.replacement_timeout = 86400

Aqui estão os registros de / var / log / messages quando o uso da memória começa a sair do controle:

iscsid: Kernel reported iSCSI connection 1:0 error (1022 - Invalid or unknown error code) state (3)
iscsid: connection1:0 is operational after recovery (1 attempts)
chronyd[761]: Selected source 199.102.46.75
systemd: Starting dnf makecache...
dnf: Metadata cache refreshed recently.’systemd: Started dnf makecache.
systemd-logind: New session 10 of user root.
systemd: Started Session 10 of user root.
systemd-logind: Removed session 10.
kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 76580624, last ping 76585814, now 76591204
kernel: connection1:0: detected conn error (1022)
iscsid: Kernel reported iSCSI connection 1:0 error (1022 - Invalid or unknown error code) state (3)
iscsid: connection1:0 is operational after recovery (1 attempts)
systemd: Starting dnf makecache...
kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 81687108, last ping 81692267, now 81700110

E alguns registros mostrando o uso da memória e os registros do kernel:

automount: page allocation stalls for 25608ms, order:1, mode:0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK)
CPU: 3 PID: 1163 Comm: automount Not tainted 4.9.5-200.fc25.i686+PAE #1
Hardware name: Dell Computer Corporation PowerEdge 1850/0D8266, BIOS A07 04/25/2008
ee55fdb8 d6f6c6c0 d7578510 00000000 ee55fde8 d6d854a6 d757cc59 027000c0
ee55fdf0 ee55fdf8 d7578510 ee55fdcc 1cbb0e3b 00000001 d76fa158 00000001
ee55feb0 d6d86040 027000c0 d7578510 00006408 00000001 ee5a6480 00000013
Call Trace:
[<d6f6c6c0>] dump_stack+0x58/0x78
[<d6d854a6>] warn_alloc+0xf6/0x110
[<d6d86040>] __alloc_pages_nodemask+0xb10/0xce0
[<d6dd5437>] ? kmem_cache_alloc+0xf7/0x1c0
[<d6c6aa89>] ? copy_process.part.44+0x549/0x14e0
[<d6c6a648>] copy_process.part.44+0x108/0x14e0
[<d6c94bf6>] ? check_preempt_curr+0x76/0x80
[<d6c6bbe4>] _do_fork+0xd4/0x370
[<d6c6bf6c>] SyS_clone+0x2c/0x30
[<d6c0369c>] do_int80_syscall_32+0x5c/0xc0
[<d73737e9>] entry_INT80_32+0x31/0x31
Mem-Info:
active_anon:10713 inactive_anon:8264 isolated_anon:0#012 active_file:249801 inactive_file:1557433 isolated_file:0#012 unevictable:1968 dirty:2 writeback:0 unstable:0#012 slab_reclaimable:188422 slab_unreclaimable:5832#012 mapped:8048 shmem:245 pagetables:1236 bounce:0#012 free:42226 free_pcp:0 free_cma:0
Node 0 active_anon:42852kB inactive_anon:33056kB active_file:999204kB inactive_file:6229732kB unevictable:7872kB isolated(anon):0kB isolated(file):0kB mapped:32192kB dirty:8kB writeback:0kB shmem:980kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
DMA free:3040kB min:68kB low:84kB high:100kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15916kB mlocked:0kB slab_reclaimable:12720kB slab_unreclaimable:92kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 751 8055 8055
Normal free:5440kB min:3472kB low:4340kB high:5208kB active_anon:0kB inactive_anon:0kB active_file:4276kB inactive_file:4088kB unevictable:0kB writepending:0kB present:892920kB managed:791232kB mlocked:0kB slab_reclaimable:740968kB slab_unreclaimable:23236kB kernel_stack:1408kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 58430 58430
HighMem free:160424kB min:512kB low:8944kB high:17376kB active_anon:42852kB inactive_anon:33056kB active_file:994928kB inactive_file:6225644kB unevictable:7872kB writepending:8kB present:7479040kB managed:7479040kB mlocked:7872kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:4944kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0 0 0
DMA: 14*4kB (UME) 9*8kB (UME) 4*16kB (UM) 5*32kB (UME) 8*64kB (UME) 7*128kB (UME) 3*256kB (UE) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 3040kB
Normal: 1032*4kB (UMEH) 126*8kB (UME) 20*16kB (UME) 6*32kB (UME) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5648kB
HighMem: 1476*4kB (UM) 3131*8kB (UM) 1817*16kB (UM) 730*32kB (UM) 497*64kB (UM) 161*128kB (M) 22*256kB (M) 38*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 160888kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
1809281 total pagecache pages
228 pages in swap cache
Swap cache stats: add 5032, delete 4804, find 7386/7796
Free swap  = 8342136kB
Total swap = 8355836kB
2096989 pages RAM
1869760 pages HighMem/MovableOnly
25442 pages reserved
0 pages hwpoisoned

Essas discussões relacionadas acima estão relacionadas? Quais alterações de configuração devo tentar?

Também está vendo um erro assim:

kernel: perf: interrupt took too long (9721 > 9697), lowering kernel.perf_event_max_sample_rate to 20000
kernel:  [<d7374857>] error_code+0x67/0x6c
    
por RobbieTheK 26.01.2017 / 20:40

1 resposta

0

Eu recebi algumas respostas de um dos mantenedores do kernel Linux. Isso parece ser um problema com os kernels de 32 bits e com baixa pressão de memória . Aqui estão alguns comentários (depois que eu forneci mais alguns logs e tracepoints):

all those are from the kswapd (background memory reclaim). Which means that it doesn't catch any allocation which can stall for too long. Anyway the above tracepoint show that we are able to make some progress during the reclaim (nr_reclaimed > 0). So I suspect that this is indeed a large lowmem pressure and I do not see what we can do about that.

e também:

and this one is hitting the min watermark while there is not really much to reclaim. Only the page cache which might be pinned and not reclaimable from this context because this is GFP_NOFS request. It is not all that surprising the reclaim context fights to get some memory. There is a huge amount of the reclaimable slab which probably just makes a slow progress.

That is not something completely surprising on 32b system I am afraid.

Parece que não é algo que será consertado em breve, ou nunca. FWIW Eu tenho essas configurações em /etc/sysctl.conf

# Disable TCP window scaling
net.ipv4.tcp_window_scaling=0
# tip from http://serverfault.com/questions/235965/why-would-a-server-not-send-a-syn-ack-packet-in-response-to-a-syn-packet
net.ipv4.tcp_timestamps=0
net.ipv4.tcp_tw_recycle=0
# vm.zone_reclaim_mode=1 CONFIG_NUMA needs to be enabled in kernel configuration for this setting
vm.min_free_kbytes=131072

#tip from https://www.spinics.net/lists/kernel/msg2403670.html
# write-cache, foreground/background flushing
vm.dirty_ratio = 3

# vm.dirty_background_ratio = 5 (% of RAM)
vm.dirty_background_ratio = 1

# make it 10 sec
vm.dirty_writeback_centisecs = 1000

# http://serverfault.com/questions/696156/kswapd-often-uses-100-cpu-when-swap-is-in-use
vm.swappiness=25
vm.vfs_cache_pressure=1000
    
por 02.03.2017 / 17:02