OpsCenter 5.2 não pode se conectar ao cluster multi-dc

1

Temos dois datacenters (192.X.X.X e 10.X.X.X) entre os quais a fofoca (porta 7001) é possível, mas não a poupança ou o protocolo nativo. O OpsCenter é executado em um nó no primeiro data center (192.X.X.X).

Após a atualização do OpsCenter 5.1.3 para o OpsCenter 5.2.0 no CentOS 6.6, o painel mostra apenas "Não é possível conectar-se ao cluster".

O arquivo opscenterd.log mostra repetidas tentativas de conexão ao cluster.

Começa com a conexão a um nó de origem:

2015-08-10 11:52:04+0200 [Cluster_01] DEBUG: Connecting to cluster, contact points: ['192.168.0.100', '192.168.0.101']; protocol version: 2
2015-08-10 11:52:04+0200 [] DEBUG: Host 192.168.0.100 is now marked up
2015-08-10 11:52:04+0200 [] DEBUG: Host 192.168.0.101 is now marked up
2015-08-10 11:52:04+0200 [Cluster_01] DEBUG: [control connection] Opening new connection to 192.168.0.100
2015-08-10 11:52:04+0200 []  INFO: Starting factory 
2015-08-10 11:52:04+0200 [Cluster_01] DEBUG: [control connection] Established new connection , registering watchers and refreshing schema and topology
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Refreshing node list and token map using preloaded results

A parte a seguir é repetida para cada nó no outro datacenter e também para cada nó do datacenter local que não esteja na lista de nós iniciais:

2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Found new host to connect to: 10.0.0.1
2015-08-10 11:52:05+0200 [Cluster_01]  INFO: New Cassandra host 10.0.0.1 discovered
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Handling new host 10.0.0.1 and notifying listeners
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Not adding connection pool for new host 10.0.0.1 because the load balancing policy has marked it as IGNORED
2015-08-10 11:52:05+0200 [] DEBUG: Host 10.0.0.1 is now marked up

O registro continua um pouco até que a conexão de controle seja fechada:

2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Finished fetching ring info
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Rebuilding token map due to topology changes
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Attempting to use preloaded results for schema agreement
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Schemas match
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] user types table not found
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Fetched schema, rebuilding metadata
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Control connection created
2015-08-10 11:52:05+0200 [] DEBUG: Initializing new connection pool for host 192.168.0.100
2015-08-10 11:52:05+0200 []  INFO: Starting factory 
2015-08-10 11:52:05+0200 []  INFO: Starting factory 
2015-08-10 11:52:05+0200 [] DEBUG: Finished initializing new connection pool for host 192.168.0.100
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Added pool for host 192.168.0.100 to session
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Shutting down Cluster Scheduler
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Not executing scheduled task due to Scheduler shutdown
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Shutting down control connection
2015-08-10 11:52:05+0200 [] DEBUG: Closing connection (46700368) to 192.168.0.100
2015-08-10 11:52:05+0200 [] DEBUG: Closed socket to 192.168.0.100
2015-08-10 11:52:05+0200 [] DEBUG: Closing connection (44407568) to 192.168.0.100
2015-08-10 11:52:05+0200 [] DEBUG: Closed socket to 192.168.0.100
2015-08-10 11:52:05+0200 [] DEBUG: Connect lost: [Failure instance: Traceback (failure with no frames): : Connection was closed cleanly.
        ]
2015-08-10 11:52:05+0200 [] DEBUG: Closing connection (47567568) to 192.168.0.100
2015-08-10 11:52:05+0200 []  INFO: Stopping factory 
2015-08-10 11:52:05+0200 [] DEBUG: Closed socket to 192.168.0.100
2015-08-10 11:52:05+0200 [] DEBUG: Connect lost: [Failure instance: Traceback (failure with no frames): : Connection was closed cleanly.
        ]
2015-08-10 11:52:05+0200 []  INFO: Stopping factory 
2015-08-10 11:52:05+0200 [] DEBUG: Connect lost: [Failure instance: Traceback (failure with no frames): : Connection was closed cleanly.
        ]
2015-08-10 11:52:05+0200 []  INFO: Stopping factory 

Então algo estranho acontece: Uma conexão é estabelecida para o primeiro nó no outro centro de dados:

2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Connecting to cluster, contact points: ['10.0.0.1']; protocol version: 2
2015-08-10 11:52:05+0200 [] DEBUG: Host 10.0.0.1 is now marked up
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Opening new connection to 10.0.0.1
2015-08-10 11:52:05+0200 []  INFO: Starting factory 
2015-08-10 11:52:07+0200 [] TRACE: Sending heartbeat.
2015-08-10 11:52:10+0200 [Cluster_01]  WARN: [control connection] Error connecting to 10.0.0.1: errors=Timed out creating connection, last_host=None
2015-08-10 11:52:10+0200 [Cluster_01] ERROR: Control connection failed to connect, shutting down Cluster: ('Unable to connect to any servers', {'10.0.0.1': OperationTimedOut('errors=Timed out creating connection, last_host=None',)})
2015-08-10 11:52:10+0200 [Cluster_01] DEBUG: Shutting down Cluster Scheduler
2015-08-10 11:52:10+0200 [Cluster_01] DEBUG: Shutting down control connection
2015-08-10 11:52:10+0200 [Cluster_01] DEBUG: Not executing scheduled task due to Scheduler shutdown
2015-08-10 11:52:10+0200 []  WARN: No cassandra connection available for hostlist ['192.168.0.100', '192.168.0.101'] .  Retrying.

Isso falha, é claro, já que não queremos que os clientes se comuniquem em data centers.

Mesmo com essa configuração de cluster, o OpsCenter ainda tenta se conectar ao outro datacenter (errado):

[cassandra]
seed_hosts = 192.168.0.100,192.168.0.101
username = opscenter
password = XXX
local_dc_pref = DC1
used_hosts_per_remote_dc = 0

Esta configuração funcionou sem problemas para todas as versões do OpsCenter até 5.2.0. É um novo requisito que todos os nós devem estar acessíveis através do protocolo nativo do OpsCenter? Não posso dizer ao OpsCenter para se conectar apenas ao seu data center local?

    
por Severin Leonhardt 10.08.2015 / 13:14

1 resposta

0

Eu posso confirmar o seu bug e ele pode ser rastreado como OPSC-6299 (desculpe nenhum bug tracker público, mas isso pode ser usado para comunicações com Datastax ou futuras referências de tickets).

O mais curto é que o OpsCenter deve respeitar essa política de balanceamento de carga, é válido, mas neste caso existe um bug.

    
por 11.08.2015 / 22:47

Tags