Temos dois datacenters (192.X.X.X e 10.X.X.X) entre os quais a fofoca (porta 7001) é possível, mas não a poupança ou o protocolo nativo. O OpsCenter é executado em um nó no primeiro data center (192.X.X.X).
Após a atualização do OpsCenter 5.1.3 para o OpsCenter 5.2.0 no CentOS 6.6, o painel mostra apenas "Não é possível conectar-se ao cluster".
O arquivo opscenterd.log
mostra repetidas tentativas de conexão ao cluster.
Começa com a conexão a um nó de origem:
2015-08-10 11:52:04+0200 [Cluster_01] DEBUG: Connecting to cluster, contact points: ['192.168.0.100', '192.168.0.101']; protocol version: 2
2015-08-10 11:52:04+0200 [] DEBUG: Host 192.168.0.100 is now marked up
2015-08-10 11:52:04+0200 [] DEBUG: Host 192.168.0.101 is now marked up
2015-08-10 11:52:04+0200 [Cluster_01] DEBUG: [control connection] Opening new connection to 192.168.0.100
2015-08-10 11:52:04+0200 [] INFO: Starting factory
2015-08-10 11:52:04+0200 [Cluster_01] DEBUG: [control connection] Established new connection , registering watchers and refreshing schema and topology
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Refreshing node list and token map using preloaded results
A parte a seguir é repetida para cada nó no outro datacenter e também para cada nó do datacenter local que não esteja na lista de nós iniciais:
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Found new host to connect to: 10.0.0.1
2015-08-10 11:52:05+0200 [Cluster_01] INFO: New Cassandra host 10.0.0.1 discovered
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Handling new host 10.0.0.1 and notifying listeners
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Not adding connection pool for new host 10.0.0.1 because the load balancing policy has marked it as IGNORED
2015-08-10 11:52:05+0200 [] DEBUG: Host 10.0.0.1 is now marked up
O registro continua um pouco até que a conexão de controle seja fechada:
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Finished fetching ring info
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Rebuilding token map due to topology changes
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Attempting to use preloaded results for schema agreement
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Schemas match
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] user types table not found
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Fetched schema, rebuilding metadata
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Control connection created
2015-08-10 11:52:05+0200 [] DEBUG: Initializing new connection pool for host 192.168.0.100
2015-08-10 11:52:05+0200 [] INFO: Starting factory
2015-08-10 11:52:05+0200 [] INFO: Starting factory
2015-08-10 11:52:05+0200 [] DEBUG: Finished initializing new connection pool for host 192.168.0.100
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Added pool for host 192.168.0.100 to session
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Shutting down Cluster Scheduler
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Not executing scheduled task due to Scheduler shutdown
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Shutting down control connection
2015-08-10 11:52:05+0200 [] DEBUG: Closing connection (46700368) to 192.168.0.100
2015-08-10 11:52:05+0200 [] DEBUG: Closed socket to 192.168.0.100
2015-08-10 11:52:05+0200 [] DEBUG: Closing connection (44407568) to 192.168.0.100
2015-08-10 11:52:05+0200 [] DEBUG: Closed socket to 192.168.0.100
2015-08-10 11:52:05+0200 [] DEBUG: Connect lost: [Failure instance: Traceback (failure with no frames): : Connection was closed cleanly.
]
2015-08-10 11:52:05+0200 [] DEBUG: Closing connection (47567568) to 192.168.0.100
2015-08-10 11:52:05+0200 [] INFO: Stopping factory
2015-08-10 11:52:05+0200 [] DEBUG: Closed socket to 192.168.0.100
2015-08-10 11:52:05+0200 [] DEBUG: Connect lost: [Failure instance: Traceback (failure with no frames): : Connection was closed cleanly.
]
2015-08-10 11:52:05+0200 [] INFO: Stopping factory
2015-08-10 11:52:05+0200 [] DEBUG: Connect lost: [Failure instance: Traceback (failure with no frames): : Connection was closed cleanly.
]
2015-08-10 11:52:05+0200 [] INFO: Stopping factory
Então algo estranho acontece: Uma conexão é estabelecida para o primeiro nó no outro centro de dados:
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: Connecting to cluster, contact points: ['10.0.0.1']; protocol version: 2
2015-08-10 11:52:05+0200 [] DEBUG: Host 10.0.0.1 is now marked up
2015-08-10 11:52:05+0200 [Cluster_01] DEBUG: [control connection] Opening new connection to 10.0.0.1
2015-08-10 11:52:05+0200 [] INFO: Starting factory
2015-08-10 11:52:07+0200 [] TRACE: Sending heartbeat.
2015-08-10 11:52:10+0200 [Cluster_01] WARN: [control connection] Error connecting to 10.0.0.1: errors=Timed out creating connection, last_host=None
2015-08-10 11:52:10+0200 [Cluster_01] ERROR: Control connection failed to connect, shutting down Cluster: ('Unable to connect to any servers', {'10.0.0.1': OperationTimedOut('errors=Timed out creating connection, last_host=None',)})
2015-08-10 11:52:10+0200 [Cluster_01] DEBUG: Shutting down Cluster Scheduler
2015-08-10 11:52:10+0200 [Cluster_01] DEBUG: Shutting down control connection
2015-08-10 11:52:10+0200 [Cluster_01] DEBUG: Not executing scheduled task due to Scheduler shutdown
2015-08-10 11:52:10+0200 [] WARN: No cassandra connection available for hostlist ['192.168.0.100', '192.168.0.101'] . Retrying.
Isso falha, é claro, já que não queremos que os clientes se comuniquem em data centers.
Mesmo com essa configuração de cluster, o OpsCenter ainda tenta se conectar ao outro datacenter (errado):
[cassandra]
seed_hosts = 192.168.0.100,192.168.0.101
username = opscenter
password = XXX
local_dc_pref = DC1
used_hosts_per_remote_dc = 0
Esta configuração funcionou sem problemas para todas as versões do OpsCenter até 5.2.0. É um novo requisito que todos os nós devem estar acessíveis através do protocolo nativo do OpsCenter? Não posso dizer ao OpsCenter para se conectar apenas ao seu data center local?