Eu instalei o sucesso do CUDA 9.0 e do CUDNN 7.0, mas instalei o tensorflow 1.4 com falha.
Minha mensagem de erro:
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ bazel build -c opt --copt=-march="haswell" --config=cuda //tensorflow/tools/pip_package:build_pip_package
.................
WARNING: The lower priority option '-c opt' does not override the previous value '-c opt'
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': error loading package 'tensorflow/tools/pip_package': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 1042
_create_local_cuda_repository(repository_ctx)
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 905, in _create_local_cuda_repository
_get_cuda_config(repository_ctx)
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 662, in _get_cuda_config
_cudnn_version(repository_ctx, cudnn_install_base..., ...)
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 360, in _cudnn_version
_find_cudnn_header_dir(repository_ctx, cudnn_install_base...)
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 612, in _find_cudnn_header_dir
auto_configure_fail(("Cannot find cudnn.h under %s" ...))
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 129, in auto_configure_fail
fail(("\n%sCuda Configuration Error:%...)))
Cuda Configuration Error: Cannot find cudnn.h under /usr/lib/x86_64-linux-gnu
WARNING: Target pattern parsing failed.
ERROR: error loading package 'tensorflow/tools/pip_package': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 1042
_create_local_cuda_repository(repository_ctx)
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 905, in _create_local_cuda_repository
_get_cuda_config(repository_ctx)
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 662, in _get_cuda_config
_cudnn_version(repository_ctx, cudnn_install_base..., ...)
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 360, in _cudnn_version
_find_cudnn_header_dir(repository_ctx, cudnn_install_base...)
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 612, in _find_cudnn_header_dir
auto_configure_fail(("Cannot find cudnn.h under %s" ...))
File "/home/sam/code/download/CNN/tensorflow_1.4/tensorflow/third_party/gpus/cuda_configure.bzl", line 129, in auto_configure_fail
fail(("\n%sCuda Configuration Error:%...)))
Cuda Configuration Error: Cannot find cudnn.h under /usr/lib/x86_64-linux-gnu
INFO: Elapsed time: 3.466s
FAILED: Build did NOT complete successfully (0 packages loaded)
currently loading: tensorflow/tools/pip_package
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$
Minhas instruções de instalação do CUDA 9.0:
mkdir -p ~/code/download/lib/cuda/
cd ~/code/download/lib/cuda/
wget -c https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
chmod 777 cuda_9.0.176_384.81_linux-run
sudo apt-get install nvidia-375
sudo sh ./cuda_9.0.176_384.81_linux-run
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda
Minhas instruções de instalação do Cudnn 7.0:
sudo dpkg -i libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb
Meu processo de configuração do Tensorflow 1.4:
echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install bazel
sudo apt install python-dev python-pip python-nose gcc g++ git gfortran vim libopenblas-dev liblapack-dev libatlas-base-dev openjdk-8-jdk
sudo pip install -U --pre pip setuptools wheel
sudo pip install -U --pre numpy scipy matplotlib scikit-learn scikit-image
mkdir -p ~/code/download/CNN/tensorflow_1.4/
cd ~/code/download/CNN/tensorflow_1.4/
git clone https://github.com/tensorflow/tensorflow.git -b r1.4
cd tensorflow
./configure
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$ ./configure
You have bazel 0.7.0 installed.
Please specify the location of python. [Default is /usr/bin/python]:
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
/home/sam/code/download/CNN/caffe_1.0_RC5/caffe-rc5/python
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n
No jemalloc as malloc support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]: N
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]: N
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]: N
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL support? [y/N]: N
No OpenCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 9.0
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 7.0
Please specify the location where cuDNN 7.0 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:/usr/lib/x86_64-linux-gnu/
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.0]
Do you want to use clang as CUDA compiler? [y/N]: N
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]: y
MPI support will be enabled for TensorFlow.
Please specify the MPI toolkit folder. [Default is /usr]:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.
Configuration finished
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$
O que posso fazer em seguida?
Obrigado ~
=======================
Eu resolvo o problema acima, instale outro deb:
sudo dpkg -i libcudnn7-dev_7.0.4.31-1+cuda9.0_amd64.deb
Depois eu compilo o tensorflow com o comando:
bazel build -c opt --copt=-march="haswell" --config=cuda //tensorflow/tools/pip_package:build_pip_package
E gera outro erro:
tensorflow/contrib/batching/kernels/batch_kernels.cc:258:19: note: 'batcher_queue' was declared here
BatcherQueue* batcher_queue;
^
ERROR: /home/sam/code/download/CNN/tensorflow_1.4/tensorflow/tensorflow/python/BUILD:1232:1: Linking of rule '//tensorflow/python:gen_checkpoint_ops_py_wrappers_cc' failed (Exit 1)
/usr/bin/ld: warning: libcufft.so.9.0, needed by bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so, not found (try using -rpath or -rpath-link)
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Ucheckpoint_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to '[email protected]'
collect2: error: ld returned 1 exit status
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2439.556s, Critical Path: 155.31s
FAILED: Build did NOT complete successfully
sam@sam:~/code/download/CNN/tensorflow_1.4/tensorflow$
O que posso fazer em seguida?
Obrigado ~