O par GlusterFS aparece desconectado em um dos pares

3

Eu estou usando um cluster GlusterFS com um Trusted Storage Pool composto de 4 peers.

  • exemplo-prod (100.100.250.197)
  • example-storage1 (100.100.248.178)
  • example-storage2 (100.100.250.25)
  • example-storage3 (100.100.255.40)

Ele está funcionando corretamente (= O volume pode ser montado, os arquivos estão sendo armazenados corretamente) com a exceção de uma coisa: o reequilíbrio não está sendo feito.

Além disso, a saída peer status e os registros em glus-glusterfs-glusterd.vol.log são preocupantes. Algo está errado e não sei como consertar isso.

Estou preocupada que um dia todo o sistema irá cair e eu vou perder todos os dados. Então eu acho que preciso resolver os problemas

Todos os servidores têm glostro 3.7.6 e estão executando o Ubuntu 16.04.

Saída do status do volume

gluster> volume status
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick example-storage1:/data/brick1/gv0     49152     0          Y       1413
Brick 100.100.250.25:/data/brick2/gv0       49152     0          Y       3081
Brick 100.100.255.40:/data/brick3/gv0       N/A       N/A        N       N/A
NFS Server on localhost                     N/A       N/A        N       N/A
NFS Server on example-storage2              N/A       N/A        N       N/A
NFS Server on example-storage1.example.com
                                            2049      0          Y       24490
NFS Server on example-storage3              N/A       N/A        N       N/A

Task Status of Volume gv0
------------------------------------------------------------------------------
Task                 : Rebalance
ID                   : 1ee56040-6bb5-4407-8ae5-f176e6c89db1
Status               : completed

Saída do status do par

No exemplo-prod

gluster> peer status
Number of Peers: 3

Hostname: example-storage3
Uuid: 5e5db480-d789-4ba4-8796-151ecb050ee8
State: Peer in Cluster (Connected)

Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Connected)
Other names:
example-storage2.example.com

Hostname: example-storage1.example.com
Uuid: 3f76dc73-77f4-4b9a-b1f1-3ba3a9aa26a7
State: Peer in Cluster (Connected)
Other names:
example-storage1.example.com

Em armazenamento de exemplo1

Number of Peers: 5

Hostname: example-storage3
Uuid: 5e5db480-d789-4ba4-8796-151ecb050ee8
State: Peer in Cluster (Connected)

Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Connected)
Other names:
example-storage2

Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Connected)

Hostname: example-storage3
Uuid: 49d9bc0a-b67d-4850-bff9-edeaa0dac8ca
State: Peer Rejected (Connected)

Hostname: example-prod.example.com
Uuid: 4170ef42-770d-4f52-be99-6c6e317f9fa0
State: Peer in Cluster (Connected)
Other names:
example-prod

Em armazenamento de exemplo2

Number of Peers: 3

Hostname: example-storage1.example.com
Uuid: 3f76dc73-77f4-4b9a-b1f1-3ba3a9aa26a7
State: Peer in Cluster (Connected)
Other names:
example-storage1

Hostname: example-storage3
Uuid: 5e5db480-d789-4ba4-8796-151ecb050ee8
State: Peer in Cluster (Connected)

Hostname: example-prod.example.com
Uuid: 4170ef42-770d-4f52-be99-6c6e317f9fa0
State: Peer in Cluster (Connected)
Other names:
example-prod

No armazenamento de exemplo3

Observe o "status desconectado"

Number of Peers: 3

Hostname: example-prod.example.com
Uuid: 4170ef42-770d-4f52-be99-6c6e317f9fa0
State: Peer in Cluster (Disconnected)
Other names:
example-prod

Hostname: example-storage1.example.com
Uuid: 3f76dc73-77f4-4b9a-b1f1-3ba3a9aa26a7
State: Peer in Cluster (Disconnected)
Other names:
example-storage1

Hostname: example-storage2
Uuid: 54566d17-f76b-45d0-82a2-ed8a474289c8
State: Peer in Cluster (Disconnected)
Other names:
example-storage2

Saída do glus-glusterfs-glusterd.vol.log

No exemplo-prod

// every 5 seconds the following line
[2018-04-13 07:07:05.602742] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/f86f1461d3e00792ac2b2fefcedc2d08.socket failed (Invalid argument)
[2018-04-13 07:07:08.603156] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/f86f1461d3e00792ac2b2fefcedc2d08.socket failed (Invalid argument)

Em armazenamento de exemplo1

// every 5 seconds the following line
[2018-04-13 07:00:38.987432] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)
[2018-04-13 07:00:41.987968] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)

Em armazenamento de exemplo2

// every 5 seconds the following line
[2018-04-13 07:08:24.119264] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)
[2018-04-13 07:08:27.119618] W [socket.c:588:__socket_rwv] 0-nfs: readv on /var/run/gluster/aa06e832c27614f8664a5cc2904c3b62.socket failed (Invalid argument)

No armazenamento de exemplo3

// The following lines repeat
[2018-04-13 07:07:54.599955] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 13, Invalid argument
[2018-04-13 07:07:54.600003] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
[2018-04-13 07:08:02.697437] I [MSGID: 106004] [glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management: Peer <example-storage2> (<54566d17-f76b-45d0-82a2-ed8a474289c8>), in state <Peer in Cluster>, has disconnected from glusterd.
[2018-04-13 07:08:04.625465] W [socket.c:869:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 14, Invalid argument
[2018-04-13 07:08:04.625513] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
    
por Silvan Mühlemann 13.04.2018 / 09:22

0 respostas

Tags