Eu encontrei minha solução.
1) sudo apt-get install mpich
2) srun --mpi=pmi2
3) As variáveis ambientais relacionadas a mkl e intel são carregadas corretamente.
Espero que isso ajude alguém com problemas semelhantes.
Estou tentando instalar o slurm em um cluster executando o Ubuntu 16.04.
Estou usando intel mpi e o diretório de instalação está localizado no nó principal /opt/intel/impi_5.01.
De acordo com a instrução slurm, ele precisa exportar a variável libpmi.so. link
Mas instalei o slurm-llnl via ubuntu
sudo apt-get slurm-llnl
e não tenho certeza de onde o libpmi.so está localizado? Então, eu fiz uma pesquisa e encontrei um arquivo aqui, este é o arquivo que estou procurando?
/usr/lib/x86_64-linux-gnu/libpmi.so
De qualquer forma, eu exportei a variável e tentei
srun -p old -N3 -n24 hostname
Ele retorna,
rolly@head:~$ srun -p old -N3 -n24 hostname
node02
node02
node02
node02
node02
node02
node02
node02
node01
node01
head
head
node01
head
head
head
node01
node01
head
node01
head
head
node01
node01
Parece funcionar.
Mas enquanto executo minha tarefa,
srun -p old -N3 -n24 ~/QE530-CPU/espresso-5.3.0/bin/pw.x
Produziu erros,
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node02: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpiexec_node01: cannot connect to local mpd (/tmp/mpd2.console_rolly); possible causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
Acredito que os prompts de erro são devidos à execução do mpiexec com o intel-mpi, ele deve estar usando o mpirun.
Como posso corrigir o problema?
Obrigado!
Eu encontrei minha solução.
1) sudo apt-get install mpich
2) srun --mpi=pmi2
3) As variáveis ambientais relacionadas a mkl e intel são carregadas corretamente.
Espero que isso ajude alguém com problemas semelhantes.