Um dos nossos aplicativos usa o Elasticsearch (1.4.4) como um cache na memória. O aplicativo é um webapp Java implementado no Tomcat 7 com o Oracle 1.7. A instância elasticsearch é uma configuração de um nó implantada no mesmo servidor.
Desde o elasticsearch 1.3.3, estamos expondo cerca de 40 MBit / s para dentro e para fora na interface de loopback entre o aplicativo e o nó Elasticsearch com um aplicativo ocioso.
Isso não é muito, mas causa uma carga perceptível para um sistema que, de outra forma, seria flatlining. Eu não tenho um sistema de produção com este aplicativo em mãos, então não posso dizer como ele evolui em prod.
Agarrar o tráfego via tcpdump e analisá-lo no Wireshark mostra que o Elasticsearch-Client no aplicativo solicita continuamente o Node por
cluster/node/info
, o que resulta em uma resposta de 10k todas as vezes.
Talvez totalmente não relacionado, mas permitindo que o registro do servidor e do cliente nos forneça:
Log do servidor do Elasticsearch:
[2015-05-12 14:45:01,600][INFO ][node ] [Illyana Rasputin] initializing ...
[2015-05-12 14:45:01,608][INFO ][plugins ] [Illyana Rasputin] loaded [], sites []
[2015-05-12 14:45:06,666][INFO ][node ] [Illyana Rasputin] initialized
[2015-05-12 14:45:06,667][INFO ][node ] [Illyana Rasputin] starting ...
[2015-05-12 14:45:06,828][INFO ][transport ] [Illyana Rasputin] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.24.1.128:9300]}
[2015-05-12 14:45:06,851][INFO ][discovery ] [Illyana Rasputin] bkbo_index/TITPDFdtR6SXX5EeOXaidg
[2015-05-12 14:45:09,892][INFO ][cluster.service ] [Illyana Rasputin] new_master [Illyana Rasputin][TITPDFdtR6SXX5EeOXaidg][dev06][inet[/10.24.1.128:9300]], reason: zen-disco-join (elected_as_master)
[2015-05-12 14:45:09,943][INFO ][http ] [Illyana Rasputin] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.24.1.128:9200]}
[2015-05-12 14:45:09,944][INFO ][node ] [Illyana Rasputin] started
[2015-05-12 14:45:11,283][INFO ][gateway ] [Illyana Rasputin] recovered [2] indices into cluster_state
Cliente do Elasticsearch:
2015-05-12 14:46:40,683 INFO [localhost-startStop-1] PluginsService:<init>:151 [Antiphon the Overseer] loaded [], sites []
2015-05-12 14:46:41,548 DEBUG [localhost-startStop-1] TransportClientNodesService:<init>:110 [Antiphon the Overseer] node_sampler_interval[5ms]
2015-05-12 14:46:41,594 DEBUG [localhost-startStop-1] TransportClientNodesService:addTransportAddresses:167 [Antiphon the Overseer] adding address [[#transport#-1][dev06][inet[localhost/127.0.0.1:9300]]]
2015-05-12 14:46:41,625 DEBUG [localhost-startStop-1] NettyTransport:connectToNode:751 [Antiphon the Overseer] connected to node [[#transport#-1][dev06][inet[localhost/127.0.0.1:9300]]]
2015-05-12 14:46:41,655 INFO [localhost-startStop-1] TransportClientNodesService$SimpleNodeSampler:doSample:371 [Antiphon the Overseer] failed to get node info for [#transport#-1][dev06][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9300]][cluster:monitor/nodes/info] request_id [0] timed out after [6ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-05-12 14:46:41,658 DEBUG [localhost-startStop-1] NettyTransport:disconnectFromNode:882 [Antiphon the Overseer] disconnecting from [[#transport#-1][dev06][inet[localhost/127.0.0.1:9300]]] due to explicit disconnect call
2015-05-12 14:46:41,661 DEBUG [elasticsearch[Antiphon the Overseer][generic][T#1]] NettyTransport:connectToNode:751 [Antiphon the Overseer] connected to node [[#transport#-1][dev06][inet[localhost/127.0.0.1:9300]]]
2015-05-12 14:46:41,669 INFO [elasticsearch[Antiphon the Overseer][generic][T#1]] TransportClientNodesService$SimpleNodeSampler:doSample:371 [Antiphon the Overseer] failed to get node info for [#transport#-1][dev06][inet[localhost/127.0.0.1:9300]], disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException: [][inet[localhost/127.0.0.1:9300]][cluster:monitor/nodes/info] request_id [1] timed out after [5ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-05-12 14:46:41,670 DEBUG [elasticsearch[Antiphon the Overseer][generic][T#1]] NettyTransport:disconnectFromNode:882 [Antiphon the Overseer] disconnecting from [[#transport#-1][dev06][inet[localhost/127.0.0.1:9300]]] due to explicit disconnect call
2015-05-12 14:46:41,676 DEBUG [elasticsearch[Antiphon the Overseer][generic][T#1]] NettyTransport:connectToNode:751 [Antiphon the Overseer] connected to node [[#transport#-1][dev06][inet[localhost/127.0.0.1:9300]]]
2015-05-12 14:46:41,677 WARN [elasticsearch[Antiphon the Overseer][transport_client_worker][T#2]{New I/O worker #2}] TransportService$Adapter:remove:280 [Antiphon the Overseer] Received response for a request that has timed out, sent [14ms] ago, timed out [9ms] ago, action [cluster:monitor/nodes/info], node [[#transport#-1][dev06][inet[localhost/127.0.0.1:9300]]], id [1]
2015-05-12 14:46:41,682 INFO [localhost-startStop-1] PluginsService:<init>:151 [Ricochet] loaded [], sites []
2015-05-12 14:46:41,722 DEBUG [localhost-startStop-1] TransportClientNodesService:<init>:110 [Ricochet] node_sampler_interval[5ms]
2015-05-12 14:46:41,733 DEBUG [localhost-startStop-1] TransportClientNodesService:addTransportAddresses:167 [Ricochet] adding address [[#transport#-1][dev06][inet[localhost/127.0.0.1:9300]]]
2015-05-12 14:46:41,734 DEBUG [localhost-startStop-1] NettyTransport:connectToNode:751 [Ricochet] connected to node [[#transport#-1][dev06][inet[localhost/127.0.0.1:9300]]]
2015-05-12 14:46:41,759 DEBUG [elasticsearch[Antiphon the Overseer][generic][T#1]] NettyTransport:connectToNode:751 [Antiphon the Overseer] connected to node [[Illyana Rasputin][TITPDFdtR6SXX5EeOXaidg][dev06][inet[localhost/127.0.0.1:9300]]]
2015-05-12 14:46:41,760 DEBUG [localhost-startStop-1] NettyTransport:connectToNode:751 [Ricochet] connected to node [[Illyana Rasputin][TITPDFdtR6SXX5EeOXaidg][dev06][inet[localhost/127.0.0.1:9300]]]
Sim, este aplicativo tem duas conexões de cliente, que devem ser OK (de acordo com os devs).
Esses ciclos de desconexão / reconexão acontecem a cada minuto ou mais.
Alguma pista do que está acontecendo aqui? Já desabilitei o multicast via discovery.zen.ping.multicast.enabled: false
.