GlusterFS Quebra do cérebro

2

Tenho enfrentado problemas no desempenho com a configuração do GlusterFS. Fizemos uma nova compilação do aplicativo ao vivo e, de repente, todos os clientes e mestres do GlusterFS também começaram a mostrar alta utilização da CPU. Isso está causando uma dor real. Minha configuração é a seguinte:

Eu tenho dois servidores mestres para glusterFS em version 3.7.4

[root@gfs1 glusterfs]# gluster volume info

Volume Name: repl-vol
Type: Replicate
Volume ID: 7535cfad-6bb9-4147-9fea-e869e7b8d565
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs1.myhost.com:/GlusterFS/repl-data
Brick2: gfs2.myhost.com:/GlusterFS/repl-data
Options Reconfigured:
cluster.self-heal-window-size: 100
performance.cache-max-file-size: 2MB
performance.cache-size: 256MB
performance.write-behind-window-size: 4MB
performance.io-thread-count: 32
cluster.data-self-heal-algorithm: diff
nfs.disable: off

[root@gfs2 ec2-user]# gluster volume info

Volume Name: repl-vol
Type: Replicate
Volume ID: 7535cfad-6bb9-4147-9fea-e869e7b8d565
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs1.myhost.com:/GlusterFS/repl-data
Brick2: gfs2.myhost.com:/GlusterFS/repl-data
Options Reconfigured:
cluster.self-heal-window-size: 100
nfs.disable: off
cluster.data-self-heal-algorithm: diff
performance.io-thread-count: 32
performance.write-behind-window-size: 4MB
performance.cache-size: 256MB
performance.cache-max-file-size: 2MB

Eu tenho cerca de 14 clientes nos quais estamos usando o glusterFS. O glusterFS está hospedando 1,2 TB de dados, que basicamente é o conteúdo estático JS / CSS / images. Temos monitorado um pico repentino na utilização da CPU do servidor. O Network IO é muito alto também 125MB / s-250MB / s. Eu verifiquei os logs e principalmente os problemas abaixo:

[2015-09-09 03:13:33.797655] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <3fd13508-b29e-4d52-8c9c-14ccd2f24b9f/100000130641_4.jpg>, ed715d52-4a39-46db-901b-16ae13f01898 on repl-vol-client-1 and 0bc0c058-b6a7-4f0d-9d46-96f7fcded0f3 on repl-vol-client-0. Skipping conservative merge on the file.
[2015-09-09 03:13:36.074219] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <3fd13508-b29e-4d52-8c9c-14ccd2f24b9f/100000132992_4.jpg>, 8b67cc38-df53-43c7-ad42-b9c616b980b1 on repl-vol-client-1 and 41f393de-9d83-4f52-bfcf-832e31a27a87 on repl-vol-client-0. Skipping conservative merge on the file.
[2015-09-09 03:13:36.076681] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <3fd13508-b29e-4d52-8c9c-14ccd2f24b9f/100000132995_4.jpg>, b1dd578b-3dfe-43dc-ad3a-d54c86298278 on repl-vol-client-1 and bd7c42b9-575f-46bc-9f56-804994f27ab0 on repl-vol-client-0. Skipping conservative merge on the file.
[2015-09-09 04:00:50.975933] I [MSGID: 108026] [afr-self-heal-entry.c:589:afr_selfheal_entry_do] 0-repl-vol-replicate-0: performing entry selfheal on cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3
[2015-09-09 04:00:51.005409] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3/100000160597.jpg>, 68c6fd47-6edc-46fe-8992-2d662bc698e8 on repl-vol-client-1 and 43e1a033-ad08-495b-b762-757cb2f566c0 on repl-vol-client-0. Skipping conservative merge on the file.
[2015-09-09 04:00:51.011467] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-repl-vol-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No data available]
[2015-09-09 04:00:51.014205] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-repl-vol-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No data available]
[2015-09-09 04:00:51.046092] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3/100000160597.jpg>, 68c6fd47-6edc-46fe-8992-2d662bc698e8 on repl-vol-client-1 and 43e1a033-ad08-495b-b762-757cb2f566c0 on repl-vol-client-0. Skipping conservative merge on the file.
[2015-09-09 04:10:53.125065] I [MSGID: 108026] [afr-self-heal-entry.c:589:afr_selfheal_entry_do] 0-repl-vol-replicate-0: performing entry selfheal on cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3
[2015-09-09 04:10:53.225256] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3/100000160597.jpg>, 68c6fd47-6edc-46fe-8992-2d662bc698e8 on repl-vol-client-1 and 43e1a033-ad08-495b-b762-757cb2f566c0 on repl-vol-client-0. Skipping conservative merge on the file.
[2015-09-09 04:10:53.232229] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-repl-vol-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No data available]
[2015-09-09 04:10:53.236203] W [MSGID: 114031] [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-repl-vol-client-0: remote operation failed. Path: (null) (00000000-0000-0000-0000-000000000000) [No data available]
[2015-09-09 04:10:53.343344] E [MSGID: 108008] [afr-self-heal-entry.c:253:afr_selfheal_detect_gfid_and_type_mismatch] 0-repl-vol-replicate-0: Gfid mismatch detected for <cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3/100000160597.jpg>, 68c6fd47-6edc-46fe-8992-2d662bc698e8 on repl-vol-client-1 and 43e1a033-ad08-495b-b762-757cb2f566c0 on repl-vol-client-0. Skipping conservative merge on the file.

Dois erros principais são remote operation failed e Gfid mismatch . Eu até tentei resolver o split brain, mas parece que estou fazendo algo errado ou não está funcionando.

Passos para recuperar:

[root@gfs2 ec2-user]# gluster volume heal repl-vol info split-brain
Brick gfs1.myhost.com:/GlusterFS/repl-data/
<gfid:cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3>
/media/klevu_images/1/0
Number of entries in split-brain: 2

Brick gfs2.myhost.com:/GlusterFS/repl-data/
/media/klevu_images/1/0
<gfid:cc9d0e49-c9ab-4dab-bca4-1c06c8a7a4e3>
Number of entries in split-brain: 2

Então eu simplesmente deletei os arquivos acima e tentei gluster volume heal repl-data

Eu não estou realmente certo de que a resolução do problema do split brain resolverá meu problema de desempenho. Além disso, os cérebros divididos continuam chegando. Meu objetivo principal é consertar o desempenho.

    
por SAM 09.09.2015 / 06:37

0 respostas