GlusterFS está falhando ao montar na inicialização


Estou executando os pacotes oficiais do GlusterFS 3.5 em uma caixa do Ubuntu 12.04 que atua como cliente e servidor, e tudo parece estar funcionando bem, exceto a montagem dos volumes do GlusterFS no momento da inicialização. Isto é o que eu vejo nos arquivos de log:

[2014-06-17 08:20:52.969258] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.0 (/usr/sbin/glusterfs --volfile-server= --volfile-id=/public_uploads /var/www/shared/public/uploads)
[2014-06-17 08:20:52.998985] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-06-17 08:20:52.999048] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-06-17 08:20:53.000373] E [socket.c:2161:socket_connect_finish] 0-glusterfs: connection to failed (Connection refused)
[2014-06-17 08:20:53.000427] E [glusterfsd-mgmt.c:1601:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: (No data available)
[2014-06-17 08:20:53.000442] I [glusterfsd-mgmt.c:1607:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2014-06-17 08:20:53.013793] W [glusterfsd.c:1095:cleanup_and_exit] (-->/usr/lib/x86_64-linux-gnu/ [0x7f686e0160f7] (-->/usr/lib/x86_64-linux-gnu/ [0x7f686e019cc4] (-->/usr/sbin/glusterfs(+0xcada) [0x7f686e6ddada]))) 0-: received signum (1), shutting down
[2014-06-17 08:20:53.013830] I [fuse-bridge.c:5444:fini] 0-fuse: Unmounting '/var/www/shared/public/uploads'.

Meu fstab contém:

proc        /proc                        proc    defaults                       0       0
/dev/xvda   /                            ext4    noatime,errors=remount-ro      0       1
/dev/xvdb   none                         swap    sw                             0       0
/dev/xvdc   /var/lib/glusterfs/brick01   ext4    defaults                       1       2 /var/www/shared/private/uploads glusterfs defaults,_netdev 0 0

Eu sei que isso costumava ser um bug no GlusterFS 3.2 para Ubuntu, mas eu entendo que foi resolvido nos pacotes PPA para o GlusterFS 3.4 como mostrado aqui: link

Eu também lembro disso trabalhando em um experimento que rodei com algumas máquinas virtuais (mas, como isso funcionava, eu não estava muito interessado nisso). Vejo que os pacotes gluster-client fornecem um trabalho inicial chamado mount-glusterfs.conf, que contém:

author "Louis Zuckerman <[email protected]>"
description "Block the mounting event for glusterfs filesystems until the network interfaces are running"

instance $MOUNTPOINT

start on mounting TYPE=glusterfs
exec start wait-for-state WAIT_FOR=static-network-up WAITER=mounting-glusterfs-$MOUNTPOINT

Mas não tenho certeza de como isso deve funcionar. Não parece estar funcionando fora da caixa. Mesmo que a montagem dos volumes glusterfs aconteça depois que a rede inicia, isso acontece antes do GlusterFS iniciar:

 * Starting RPC portmapper replacement                                   [ OK ]
 * Stopping rpcsec_gss daemon                                            [ OK ]
 * Starting Start this job to wait until rpcbind is started or fails to s[ OK ]
 * Starting configure network device                                     [ OK ]
 * Stopping Start this job to wait until rpcbind is started or fails to s[ OK ]
 * Starting Bridge socket events into upstart                            [ OK ]
 * Starting NSM status monitor                                           [ OK ]
 * Stopping cold plug devices                                            [ OK ]
 * Stopping log initial device creation                                  [ OK ]
 * Starting load fallback graphics devices                               [ OK ]
 * Starting configure network device security                            [ OK ]
 * Starting load fallback graphics devices                               [fail]
 * Starting configure virtual network devices                            [ OK ]
 * Starting Send an event to indicate plymouth is up                     [ OK ]
 * Stopping Send an event to indicate plymouth is up                     [ OK ]
 * Starting Mount network filesystems                                    [ OK ]
 * Stopping configure virtual network devices                            [ OK ]
 * Stopping Mount network filesystems                                    [ OK ]
 * Starting Mount network filesystems                                    [ OK ]
 * Stopping Mount network filesystems                                    [ OK ]
 * Starting configure network device                                     [ OK ]
 * Starting set sysctls from /etc/sysctl.conf                            [ OK ]
 * Stopping set sysctls from /etc/sysctl.conf                            [ OK ]
The disk drive for /var/www/shared/public/uploads is not ready yet or not present.
Continue to wait, or Press S to skip mounting or M for manual recovery
 * Starting Waiting for state                                            [fail]
 * Starting Block the mounting event for glusterfs filesystems until the [fail]k interfaces are running
mountall: Event failed

Mount failed. Please check the log file for more details.
 * Starting GNU Screen Cleanup                                           [ OK ]
 * Starting flush early job output to logs                               [ OK ]
 * Starting base                                                         [ OK ]
 * Starting save udev log and update rules                               [ OK ]
 * Starting OpenSSH server                                               [ OK ]
 * Stopping Failsafe Boot Delay                                          [ OK ]
 * Starting System V initialisation compatibility                        [ OK ]
 * Stopping save udev log and update rules                               [ OK ]
 * Stopping Mount filesystems on boot                                    [ OK ]
 * Stopping GNU Screen Cleanup                                           [ OK ]
 * Stopping flush early job output to logs                               [ OK ]
 * Starting system logging daemon                                        [ OK ]
 * Stopping System V initialisation compatibility                        [ OK ]
 * Starting System V runlevel compatibility                              [ OK ]
 * Starting save kernel messages                                         [ OK ]
 * Starting deferred execution scheduler                                 [ OK ]
 * Starting CPU interrupts balancing daemon                              [ OK ]
 * Starting regular background program processing daemon                 [ OK ]
 * Starting automatic crash report generation                            [ OK ]
 * Starting GlusterFS Management Daemon                                  [ OK ]

Alguma idéia do que está acontecendo e / ou como corrigi-lo?

Como uma alternativa que eu não estou muito entusiasmado, eu tentei ter um trabalho inicial montando esses volumes. Adicionei noauto às minhas entradas fstab glusterfs para que elas não fossem montadas automaticamente no item de inicialização e criassem um trabalho inicial com esses conteúdos:

description "Mount public uploads"

start on started glusterfs-server

exec mount /var/www/shared/public/uploads

Quando reiniciei o servidor, o volume não foi montado. /var/log/upstart/mount_public_uploads.log contém:

Mount failed. Please check the log file for more details.

e /var/log/glusterf/var-www-shared-public-uploads.log cotains:

2014-06-19 15:01:47.170299] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.0 (/usr/sbin/glusterfs --volfile-server= --volfile-id=/public_uploads /var/www/shared/public/uploads)
[2014-06-19 15:01:47.190852] I [socket.c:3561:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-06-19 15:01:47.190933] I [socket.c:3576:socket_init] 0-glusterfs: using system polling thread
[2014-06-19 15:01:50.613939] I [dht-shared.c:311:dht_init_regex] 0-public_uploads-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$
[2014-06-19 15:01:50.616107] I [socket.c:3561:socket_init] 0-public_uploads-client-0: SSL support is NOT enabled
[2014-06-19 15:01:50.616128] I [socket.c:3576:socket_init] 0-public_uploads-client-0: using system polling thread
[2014-06-19 15:01:50.616158] I [client.c:2273:notify] 0-public_uploads-client-0: parent translators are ready, attempting connect on transport
Final graph:
  1: volume public_uploads-client-0
  2:     type protocol/client
  3:     option remote-host
  4:     option remote-subvolume /var/lib/glusterfs/brick01/public_uploads
  5:     option transport-type socket
  6:     option username 51275c7d-33b4-46cc-b8e9-9c06b5dfcda5
  7:     option password 36401ce2-18e7-427e-b126-30d2d9351480
  8:     option transport.socket.ssl-enabled off
  9: end-volume
 11: volume public_uploads-dht
 12:     type cluster/distribute
 13:     subvolumes public_uploads-client-0
 14: end-volume
 16: volume public_uploads-write-behind
 17:     type performance/write-behind
 18:     subvolumes public_uploads-dht
 19: end-volume
 21: volume public_uploads-read-ahead
 22:     type performance/read-ahead
 23:     subvolumes public_uploads-write-behind
 24: end-volume
 26: volume public_uploads-io-cache
 27:     type performance/io-cache
 28:     subvolumes public_uploads-read-ahead
 29: end-volume
 31: volume public_uploads-quick-read
 32:     type performance/quick-read
 33:     subvolumes public_uploads-io-cache
 34: end-volume
 36: volume public_uploads-open-behind
 37:     type performance/open-behind
 38:     subvolumes public_uploads-quick-read
 39: end-volume
 41: volume public_uploads-md-cache
 42:     type performance/md-cache
 43:     subvolumes public_uploads-open-behind
 44: end-volume
 46: volume public_uploads
 47:     type debug/io-stats
 48:     option latency-measurement off
 49:     option count-fop-hits off
 50:     subvolumes public_uploads-md-cache
 51: end-volume
[2014-06-19 15:01:50.619723] E [client-handshake.c:1742:client_query_portmap_cbk] 0-public_uploads-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2014-06-19 15:01:50.619795] I [client.c:2208:client_rpc_notify] 0-public_uploads-client-0: disconnected from Client process will keep trying to connect to glusterd until brick's port is available
[2014-06-19 15:01:50.629922] I [fuse-bridge.c:4946:fuse_graph_setup] 0-fuse: switched to graph 0
[2014-06-19 15:01:50.630166] I [fuse-bridge.c:3883:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.22
[2014-06-19 15:01:50.630473] W [fuse-bridge.c:739:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
[2014-06-19 15:01:50.642752] I [fuse-bridge.c:4787:fuse_thread_proc] 0-fuse: unmounting /var/www/shared/public/uploads
[2014-06-19 15:01:50.643121] W [glusterfsd.c:1095:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/ [0x7f6d5111c3fd] (-->/lib/x86_64-linux-gnu/ [0x7f6d513efe9a] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xc5) [0x7f6d51ee91b5]))) 0-: received signum (15), shutting down
[2014-06-19 15:01:50.643144] I [fuse-bridge.c:5444:fini] 0-fuse: Unmounting '/var/www/shared/public/uploads'.

dos quais eu acho que esta é a linha importante:

[2014-06-19 15:01:50.619723] E [client-handshake.c:1742:client_query_portmap_cbk] 0-public_uploads-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.

Se eu executar manualmente o serviço mount_public_uploads, ele será montado corretamente. Talvez esteja tentando montar antes que o glusterfs esteja pronto?

por pupeno 13.06.2014 / 10:57

2 respostas


Este parece ser um problema conhecido que, de acordo com README.Ubuntu deve ser corrigido no Ubuntu 14.04.

Uma solução possível para versões anteriores do Ubuntu poderia ser adiar a montagem de volume com uma tarefa de inicialização personalizada após o servidor GlusterFS ser iniciado.

por 13.06.2014 / 11:56

Talvez você tenha instalado o glusterfs-client do repositório do Ubuntu antes de configurar o seu PPA? Portanto, você não teria a versão atualizada.

por 18.06.2014 / 15:55