Como instalar tensorflow com CUDA 9.0 e CUDNN 7.0?

2

Eu instalei o sucesso do CUDA 9.0 e do CUDNN 7.0, mas instalei o tensorflow 1.4 com falha.

Minha mensagem de erro:

sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ bazel build -c opt --copt=-march="haswell" --config=cuda //tensorflow/tools/pip_package:build_pip_package
.................
WARNING: The lower priority option '-c opt' does not override the previous value '-c opt'
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': error loading package 'tensorflow/tools/pip_package': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 1042
        _create_local_cuda_repository(repository_ctx)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 905, in _create_local_cuda_repository
        _get_cuda_config(repository_ctx)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 662, in _get_cuda_config
        _cudnn_version(repository_ctx, cudnn_install_base..., ...)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 360, in _cudnn_version
        _find_cudnn_header_dir(repository_ctx, cudnn_install_base...)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 612, in _find_cudnn_header_dir
        auto_configure_fail(("Cannot find cudnn.h under %s" ...))
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 129, in auto_configure_fail
        fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: Cannot find cudnn.h under /usr/lib/x86_64-linux-gnu
WARNING: Target pattern parsing failed.
ERROR: error loading package 'tensorflow/tools/pip_package': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 1042
        _create_local_cuda_repository(repository_ctx)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 905, in _create_local_cuda_repository
        _get_cuda_config(repository_ctx)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 662, in _get_cuda_config
        _cudnn_version(repository_ctx, cudnn_install_base..., ...)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 360, in _cudnn_version
        _find_cudnn_header_dir(repository_ctx, cudnn_install_base...)
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 612, in _find_cudnn_header_dir
        auto_configure_fail(("Cannot find cudnn.h under %s" ...))
    File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 129, in auto_configure_fail
        fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: Cannot find cudnn.h under /usr/lib/x86_64-linux-gnu
INFO: Elapsed time: 3.466s
FAILED: Build did NOT complete successfully (0 packages loaded)
    currently loading: tensorflow/tools/pip_package
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

Minhas instruções de instalação do CUDA 9.0:

mkdir -p ~/code/download/lib/cuda/
cd ~/code/download/lib/cuda/
wget -c https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
chmod 777 cuda_9.0.176_384.81_linux-run
sudo apt-get install nvidia-375
sudo sh ./cuda_9.0.176_384.81_linux-run
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

Minhas instruções de instalação do Cudnn 7.0:

sudo dpkg -i libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb

Meu processo de configuração do Tensorflow 1.4:

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install bazel
sudo apt install python-dev python-pip python-nose gcc g++ git gfortran vim libopenblas-dev liblapack-dev libatlas-base-dev openjdk-8-jdk
sudo pip install -U --pre pip setuptools wheel
sudo pip install -U --pre numpy scipy matplotlib scikit-learn scikit-image
mkdir -p ~/code/download/CNN/tensorflow_1.4/
cd ~/code/download/CNN/tensorflow_1.4/
git clone https://github.com/tensorflow/tensorflow.git -b r1.4
cd tensorflow
./configure
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ ./configure
You have bazel 0.7.0 installed.
Please specify the location of python. [Default is /usr/bin/python]: 


Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
  /home/sam/code/download/CNN/caffe_1.0_RC5/caffe-rc5/python
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n
No jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: N
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: N
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: N
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL support? [y/N]: N
No OpenCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 9.0


Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 7.0


Please specify the location where cuDNN 7.0 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:/usr/lib/x86_64-linux-gnu/


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.0]


Do you want to use clang as CUDA compiler? [y/N]: N
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 


Do you wish to build TensorFlow with MPI support? [y/N]: y
MPI support will be enabled for TensorFlow.

Please specify the MPI toolkit folder. [Default is /usr]: 


Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 


Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.
Configuration finished
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

O que posso fazer em seguida?

Obrigado ~

=======================

Eu resolvo o problema acima, instale outro deb:

sudo dpkg -i libcudnn7-dev_7.0.4.31-1+cuda9.0_amd64.deb

Depois eu compilo o tensorflow com o comando:

bazel build -c opt --copt=-march="haswell" --config=cuda //tensorflow/tools/pip_package:build_pip_package

E gera outro erro:

tensorflow/contrib/batching/kernels/batch_kernels.cc:258:19: note: 'batcher_queue' was declared here
     BatcherQueue* batcher_queue;
                   ^
ERROR: /home/sam/code/download/CNN/tensorflow_1.4/tensorflow/tensorflow/python/BUILD:1232:1: Linking of rule '//tensorflow/python:gen_checkpoint_ops_py_wrappers_cc' failed (Exit 1)
/usr/bin/ld: warning: libcufft.so.9.0, needed by bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so, not found (try using -rpath or -rpath-link)
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
collect2: error: ld returned 1 exit status
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2439.556s, Critical Path: 155.31s
FAILED: Build did NOT complete successfully
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

O que posso fazer em seguida?

Obrigado ~

    
por sam 19.11.2017 / 16:45

2 respostas

1

Eu encontrei a resposta!

Eu preciso criar um link suave:

sudo ln -s /usr/local/cuda-9.0/lib64/libcufft.so /usr/lib/libcufft.so.9.0

Depois, reconfigurei com suporte a MPI para falso.

Depois disso, este comando é sucesso!

At global scope:
cc1plus: warning: unrecognized command line option '-Wno-self-assign'
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 275.306s, Critical Path: 36.05s
INFO: Build completed successfully, 602 total actions
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

Então eu corro:

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
一 11月 20 09:53:08 CST 2017 : === Using tmpdir: /tmp/tmp.xpC8nRamZR
~/code/download/CNN/tensorflow_1.4/tensorflow/bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles ~/code/download/CNN/tensorflow_1.4/tensorflow
~/code/download/CNN/tensorflow_1.4/tensorflow
/tmp/tmp.xpC8nRamZR ~/code/download/CNN/tensorflow_1.4/tensorflow
一 11月 20 09:53:10 CST 2017 : === Building wheel
warning: no files found matching '*.dll' under directory '*'
warning: no files found matching '*.lib' under directory '*'
warning: no files found matching '*.h' under directory 'tensorflow/include/tensorflow'
warning: no files found matching '*' under directory 'tensorflow/include/Eigen'
warning: no files found matching '*' under directory 'tensorflow/include/external'
warning: no files found matching '*.h' under directory 'tensorflow/include/google'
warning: no files found matching '*' under directory 'tensorflow/include/third_party'
warning: no files found matching '*' under directory 'tensorflow/include/unsupported'
~/code/download/CNN/tensorflow_1.4/tensorflow
一 11月 20 09:53:35 CST 2017 : === Output wheel file is in: /tmp/tensorflow_pkg
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

Então eu achei que ele criaria o arquivo tensorflow whl:

sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ ls /tmp/tensorflow_pkg
tensorflow-1.4.1-cp27-cp27mu-linux_x86_64.whl
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ 

Depois eu removo o antigo tensorflow:

sudo pip uninstall tensorflow-gpu
sudo pip uninstall tensorflow-tensorboard

Eu instalei um novo que eu compilei o sucesso!

sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-1.4.1-cp27-cp27mu-linux_x86_64.whl

Então eu crio o link do CUDA:

sudo ln -s /usr/local/cuda-9.0/lib64/libcusolver.so /usr/lib/libcusolver.so.9.0

Então eu testo o sucesso do fluxo de tensão!

sam@sam:~/code/download/lib/cudnn7$ python -c 'import os; import inspect; import tensorflow; print(os.path.dirname(inspect.getfile(tensorflow)))'
/usr/local/lib/python2.7/dist-packages/tensorflow
sam@sam:~/code/download/lib/cudnn7$ 

Obrigado ~

    
por sam 20.11.2017 / 04:05
-1

Pré-requisitos de GPU / CUDA para o mais recente TensorFlow (v1.5 +) é fornecido abaixo

link de instalação Easy CUDA-9.0 e cvDNN-7.0

    
por Ashok Kumar Pant 06.03.2018 / 05:31

Tags