Falhas na alocação de páginas no armazenamento iSCSI

4

Temos um servidor iscsi do CentOS 6.3 (16GB de RAM) rodando no barramento Infiniband (ipoib).

Quando a carga está alta, consigo ver vários erros:

Sep  3 23:22:20 stor4 kernel: tgtd: page allocation failure. order:2, mode:0x20
Sep  3 23:22:20 stor4 kernel: Pid: 3637, comm: tgtd Not tainted 2.6.32 #1
Sep  3 23:22:20 stor4 kernel: Call Trace:
Sep  3 23:22:20 stor4 kernel: [] ? __alloc_pages_nodemask+0x77f/0x940
Sep  3 23:22:20 stor4 kernel: [] ? kmem_getpages+0x62/0x170
Sep  3 23:22:20 stor4 kernel: [] ? fallback_alloc+0x1ba/0x270
Sep  3 23:22:20 stor4 kernel: [] ? cache_grow+0x2cf/0x320
Sep  3 23:22:20 stor4 kernel: [] ? ____cache_alloc_node+0x99/0x160
Sep  3 23:22:20 stor4 kernel: [] ? pskb_expand_head+0x64/0x270
Sep  3 23:22:20 stor4 kernel: [] ? __kmalloc+0x189/0x220
Sep  3 23:22:20 stor4 kernel: [] ? pskb_expand_head+0x64/0x270
Sep  3 23:22:20 stor4 kernel: [] ? __pskb_pull_tail+0x2aa/0x360
Sep  3 23:22:20 stor4 kernel: [] ? tcp_init_tso_segs+0x37/0x50
Sep  3 23:22:20 stor4 kernel: [] ? dev_queue_xmit+0x4bb/0x6f0
Sep  3 23:22:20 stor4 kernel: [] ? neigh_connected_output+0xbd/0x100
Sep  3 23:22:20 stor4 kernel: [] ? ip_finish_output+0x237/0x310
Sep  3 23:22:20 stor4 kernel: [] ? ip_output+0xb8/0xc0
Sep  3 23:22:20 stor4 kernel: [] ? __ip_local_out+0x9f/0xb0
Sep  3 23:22:20 stor4 kernel: [] ? ip_local_out+0x25/0x30
Sep  3 23:22:20 stor4 kernel: [] ? ip_queue_xmit+0x190/0x420
Sep  3 23:22:20 stor4 kernel: [] ? sock_aio_write+0x167/0x180
Sep  3 23:22:20 stor4 kernel: [] ? tcp_transmit_skb+0x3fe/0x7b0
Sep  3 23:22:20 stor4 kernel: [] ? tcp_write_xmit+0x1fb/0xa20
Sep  3 23:22:20 stor4 kernel: [] ? __tcp_push_pending_frames+0x30/0xe0
Sep  3 23:22:20 stor4 kernel: [] ? tcp_push_pending_frames+0x33/0x40
Sep  3 23:22:20 stor4 kernel: [] ? do_tcp_setsockopt+0x3d6/0x480
Sep  3 23:22:20 stor4 kernel: [] ? tcp_setsockopt+0x2a/0x30
Sep  3 23:22:20 stor4 kernel: [] ? sock_common_setsockopt+0x14/0x20
Sep  3 23:22:20 stor4 kernel: [] ? sys_setsockopt+0x7f/0xe0
Sep  3 23:22:20 stor4 kernel: [] ? system_call_fastpath+0x16/0x1b
Sep  3 23:22:20 stor4 kernel: Mem-Info:
Sep  3 23:22:20 stor4 kernel: Node 0 DMA per-cpu:
Sep  3 23:22:20 stor4 kernel: CPU    0: hi:    0, btch:   1 usd:   0
Sep  3 23:22:20 stor4 kernel: CPU    1: hi:    0, btch:   1 usd:   0
Sep  3 23:22:20 stor4 kernel: CPU    2: hi:    0, btch:   1 usd:   0
Sep  3 23:22:20 stor4 kernel: CPU    3: hi:    0, btch:   1 usd:   0
Sep  3 23:22:20 stor4 kernel: Node 0 DMA32 per-cpu:
Sep  3 23:22:20 stor4 kernel: CPU    0: hi:  186, btch:  31 usd: 183
Sep  3 23:22:20 stor4 kernel: CPU    1: hi:  186, btch:  31 usd:  23
Sep  3 23:22:20 stor4 kernel: CPU    2: hi:  186, btch:  31 usd: 183
Sep  3 23:22:20 stor4 kernel: CPU    3: hi:  186, btch:  31 usd: 181
Sep  3 23:22:20 stor4 kernel: Node 0 Normal per-cpu:
Sep  3 23:22:20 stor4 kernel: CPU    0: hi:  186, btch:  31 usd: 171
Sep  3 23:22:20 stor4 kernel: CPU    1: hi:  186, btch:  31 usd:  29
Sep  3 23:22:20 stor4 kernel: CPU    2: hi:  186, btch:  31 usd:  32
Sep  3 23:22:20 stor4 kernel: CPU    3: hi:  186, btch:  31 usd:  32
Sep  3 23:22:20 stor4 kernel: active_anon:1875 inactive_anon:2473 isolated_anon:0
Sep  3 23:22:20 stor4 kernel: active_file:1243637 inactive_file:2505055 isolated_file:0
Sep  3 23:22:20 stor4 kernel: unevictable:0 dirty:268338 writeback:0 unstable:0
Sep  3 23:22:20 stor4 kernel: free:86050 slab_reclaimable:132377 slab_unreclaimable:23744
Sep  3 23:22:20 stor4 kernel: mapped:1293 shmem:222 pagetables:720 bounce:0
Sep  3 23:22:20 stor4 kernel: Node 0 DMA free:15732kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15332kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Sep  3 23:22:20 stor4 kernel: lowmem_reserve[]: 0 2172 16060 16060
Sep  3 23:22:20 stor4 kernel: Node 0 DMA32 free:107544kB min:18268kB low:22832kB high:27400kB active_anon:468kB inactive_anon:2364kB active_file:566208kB inactive_file:976112kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2224900kB mlocked:0kB dirty:96816kB writeback:0kB mapped:908kB shmem:12kB slab_reclaimable:176940kB slab_unreclaimable:968kB kernel_stack:64kB pagetables:192kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Sep  3 23:22:20 stor4 kernel: lowmem_reserve[]: 0 0 13887 13887
Sep  3 23:22:20 stor4 kernel: Node 0 Normal free:220924kB min:116772kB low:145964kB high:175156kB active_anon:7032kB inactive_anon:7528kB active_file:4408340kB inactive_file:9044108kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14220800kB mlocked:0kB dirty:976536kB writeback:0kB mapped:4264kB shmem:876kB slab_reclaimable:352568kB slab_unreclaimable:94008kB kernel_stack:2048kB pagetables:2688kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Sep  3 23:22:20 stor4 kernel: lowmem_reserve[]: 0 0 0 0
Sep  3 23:22:20 stor4 kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15732kB
Sep  3 23:22:20 stor4 kernel: Node 0 DMA32: 16305*4kB 4381*8kB 353*16kB 8*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 107900kB
Sep  3 23:22:20 stor4 kernel: Node 0 Normal: 14548*4kB 14808*8kB 2420*16kB 31*32kB 5*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 220784kB
Sep  3 23:22:20 stor4 kernel: 3748822 total pagecache pages
Sep  3 23:22:20 stor4 kernel: 0 pages in swap cache
Sep  3 23:22:20 stor4 kernel: Swap cache stats: add 0, delete 0, find 0/0
Sep  3 23:22:20 stor4 kernel: Free swap  = 975864kB
Sep  3 23:22:20 stor4 kernel: Total swap = 975864kB
Sep  3 23:22:20 stor4 kernel: 4194303 pages RAM
Sep  3 23:22:20 stor4 kernel: 126915 pages reserved
Sep  3 23:22:20 stor4 kernel: 3753534 pages shared
Sep  3 23:22:20 stor4 kernel: 213500 pages non-shared

Pilha TCP e configuração da VM:

net.core.rmem_max = 83886080
net.core.wmem_max = 83886080
net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.ipv4.tcp_rmem = 40960 1048560 4194304
net.ipv4.tcp_wmem = 40960 196608  4194304
net.ipv4.tcp_mem = 16388608 16388608 16388608
vm.min_free_kbytes=135168

Ajustes adicionais:

/sbin/blockdev --setra 16384 /dev/sdb
echo 2048 > /sys/block/sdb/queue/nr_requests

Onde o problema pode estar? Obrigado.

    
por Dave 03.09.2012 / 22:39

1 resposta

1

Algumas coisas que você pode tentar ... mas o iSCSI sobre IPoIB parece um pouco confuso. Obviamente, o desempenho deve importar se você estiver usando o Infiniband.

  • Fora dos erros, como está o desempenho?
  • Isso é reproduzível? Você pode acioná-lo sob demanda ou as mensagens estão se acumulando no buffer dmesg ?
  • Qual sistema de arquivos você está usando no dispositivo iSCSI montado? Isso pode influenciar minhas recomendações.

De qualquer forma, como você está no CentOS 6.3, eu consideraria seriamente ativar o tuned-adm profile set. Para você, se ainda não estiver instalado, execute yum install tuned tuned-utils e tente o perfil "armazenamento corporativo":

tuned-adm profile enterprise-storage

Isso moverá seus elevadores de E / S para o programador de prazos , alterará o kernel.sched_min_ granularity_ns para 10ms, fará alguns ajustes no subsistema vm, remova barreiras de gravação, modifique o controlador da CPU e aumente a leitura antecipada do disco. Você também pode mover suas configurações de sysctl e sysfs para um perfil personalizado.

A reversão às configurações originais pode ser feita com tuned-adm off . Esses comandos são seguros para rodar em tempo real. Você pode testar e relatar de volta?

    
por 03.09.2012 / 23:56