Como equilibrar o elasticsearch (java) em uma máquina muito poderosa?

Question

Como equilibrar o elasticsearch (java) em uma máquina muito poderosa?

Navegue suas respostas

#1 resposta do (2 votos)
#2 resposta do (1 votos)

1

Em uma classe XEON de 16 núcleos, servidor RHEL de 128 GB de RAM, eu quero implantar o elasticsearch.

O que é preferível em termos de desempenho?

Tem um processo enorme de elasticsearch para usar todos os recursos no host nativo?
Quebre o host para, por exemplo 4 máquinas virtuais iguais (KVM) e implantar um cluster elasticsearch com uma instância de elasticsearch em cada VM.
Crie contêineres do docker no host nativo e implante o cluster do elasticsearch neles.

Obrigado!

performance java virtualization docker elasticsearch

por yannisf 25.05.2016 / 07:26

2 respostas

1

2 e 3 são as melhores opções porque os documentos sugerem 32GB de memória máxima atribuída para cada nó.

por 25.05.2016 / 20:58

Tags performance java virtualization docker elasticsearch

dcpromo / forceremoval como um servidor de arquivos Como faço para criar uma tarefa agendada que será executada como o usuário conectado

score 2 · Accepted Answer

Opção 4: execute vários intsances / nodes na mesma máquina.

Isso é como as opções 2 e 3, exceto que é mais simples porque não há virtualização ou conteinerização. Está tudo funcionando nativamente no host.

Aqui está uma lista de advertências e recomendações ao fazer isso (de aqui ).

Max heap size for each node instance should be < 32Gb. This is because a heap size above 32Gb will actually be counterproductive for the JVM will stop compressing pointers.

Leave 50% of the memory to file system cache for Lucene.

While you may have enough RAM to run multiple instances on the same machine, test to see if there is enough CPU/processing power.

You will also want to check to make sure that the multiple node instances are not competing for disk space or disk IO. Our recommendation is to give all the nodes on the machine a raid0 with lots of disks underneath, or a dedicated disk per node.

Lower processors setting accordingly. Each ES node detects the # of cores available on the machine (not aware of other nodes present). With multiple nodes on the same machine, each node can think that it has dedicated access to all cores on the machine (this can be problematic for the default thread pool sizes are derived from this). So you will want to explicitly specify the # of cores available via the processors setting so that it does not end up overallocating the thread pools. For example, roughly # of cores / # of nodes can be a good start to configure for each node.

Keep in mind that multiple nodes also means that network connections, OS file descriptors, mmap file limits, will also be shared between the nodes so you will want to make sure that there is enough bandwidth and the limits are set high enough to accommodate the nodes.

The more nodes you have on the machine, the more nodes will fail at once if a single server goes down. Also, you will want to make sure that you don’t end up with all copies of a shard on the same machine. You can prevent this by settingcluster.routing.allocation.same_shard.host to true. See here for details.
To ensure cluster stability, each dedicated master node instance should be on its own machine (certainly can be a much smaller machine, eg. 4Gb of RAM to start with) -Keep in mind that multiple nodes on a machine means additional complexity in management (eg. keeping track of different ports, config files, etc..). A good way to manage the configuration for multiple instances is to create a separate elasticsearch.yml file per instance, eg. you can pass in the -Des.config parameter to specify the yml file for each instance on startup:
$ bin/elasticsearch -Des.config=$ES_HOME/config/elasticsearch.1.yml
$ bin/elasticsearch -Des.config=$ES_HOME/config/elasticsearch.2.yml
Each yml will point to the same cluster name.

It will be helpful to specify meaningful node names

Use explicit port numbers for each node so that they are predictable (eg. http.port and transport.tcp.port).

Each node should have its own path.* directories (eg. path.data, path.log, path.work, path.plugins) so the nodes will not end up having conflicting folder locations for data, plugins, logs, etc..

Como mencionado em outra resposta, você não deseja usar mais de 32 GB por instância e também não deseja usar toda a sua RAM para o heap Java. Em vez disso, é melhor deixar pelo menos 50% disso no sistema operacional para o armazenamento em cache do sistema de arquivos.

Há uma explicação muito boa sobre por que esse é o caso do artigo do blog da Elastic .