Em relação à demonstração do TeraSort no hadoop, sugira se o sintoma é o esperado ou se a carga de trabalho deve ser distribuída.
Iniciou o Hadoop (3 nós em um cluster) e executou o benchmark TeraSort como abaixo em Execuções .
Eu esperava que todos os 3 nós ficassem ocupados e todo o CPU fosse utilizado (400% no topo). No entanto, apenas o nó no qual o job começou ficou ocupado e a CPU não foi totalmente utilizada. Por exemplo, se é iniciado em sydspark02, top mostra como abaixo.
Gostaria de saber se isso é o esperado ou se há um problema de configuração pelo qual a carga de trabalho não é distribuída entre os nós.
sydspark02
top - 13:37:12 up 5 days, 2:58, 2 users, load average: 0.22, 0.06, 0.12
Tasks: 134 total, 1 running, 133 sleeping, 0 stopped, 0 zombie
%Cpu(s): 27.5 us, 2.7 sy, 0.0 ni, 69.8 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 8175980 total, 1781888 used, 6394092 free, 68 buffers
KiB Swap: 0 total, 0 used, 0 free. 532116 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2602 hadoop 20 0 2191288 601352 22988 S 120.7 7.4 0:15.52 java
1197 hadoop 20 0 105644 976 0 S 0.3 0.0 0:00.16 sshd
2359 hadoop 20 0 2756336 270332 23280 S 0.3 3.3 0:08.87 java
sydspark01
top - 13:38:32 up 2 days, 19:28, 2 users, load average: 0.15, 0.07, 0.11
Tasks: 141 total, 1 running, 140 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 1.2 sy, 0.0 ni, 96.6 id, 2.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 10240364 total, 10092352 used, 148012 free, 648 buffers
KiB Swap: 0 total, 0 used, 0 free. 8527904 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10635 hadoop 20 0 2766264 238540 23160 S 3.0 2.3 0:11.15 java
11353 hadoop 20 0 2770680 287504 22956 S 1.0 2.8 0:08.97 java
11057 hadoop 20 0 2888396 327260 23068 S 0.7 3.2 0:12.42 java
sydspark03
top - 13:44:21 up 5 days, 1:01, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 124 total, 1 running, 123 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 8175980 total, 4552876 used, 3623104 free, 1156 buffers
KiB Swap: 0 total, 0 used, 0 free. 3818884 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29374 hadoop 20 0 2729012 204180 22952 S 3.0 2.5 0:07.47 java
> sbin/start-dfs.sh
Starting namenodes on [sydspark01]
sydspark01: starting namenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-namenode-sydspark01.out
sydspark03: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark03.out
sydspark02: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark02.out
sydspark01: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark01.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-sydspark01.out
> sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-sydspark01.out
sydspark01: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark01.out
sydspark03: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark03.out
sydspark02: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark02.out
> hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar teragen 100000000 /user/hadoop/terasort-input
> hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar terasort /user/hadoop/terasort-input /user/hadoop/terasort-output
escravos
sydspark01
sydspark02
sydspark03
core-sites.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://sydspark01:9000</value>
</property>
</configuration>
Ubuntu 14.04.4 LTS 4CPU no VMWare
Hadoop 2.7.3
Java 8
Ao executar o JMC no nó de dados no nó em que a tarefa é executada.
CPU
Somente cerca de 25% do recurso da CPU (1 CPU de 4) é usado.
Memória
Tags hadoop distribution