Falha ao inicializar NVML: Incompatibilidade de versão de driver / biblioteca [closed]

7

Você pode me ajudar a corrigir esse erro?

mona@pascal:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
mona@pascal:~$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
mona@pascal:~$ lsmod | grep -i nvidia
nvidia               8643887  0 
drm                   303102  1 nvidia

Eu também recebo isso no dmesg:

mona@pascal:~$ dmesg | grep -i nvidia
[623245.802854] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[623245.814561] NVRM: make sure that this kernel module and all NVIDIA driver
[623245.814568] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[623245.826374] NVRM: make sure that this kernel module and all NVIDIA driver
[623245.826382] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[623245.838521] NVRM: make sure that this kernel module and all NVIDIA driver
[623245.838529] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[623245.850499] NVRM: make sure that this kernel module and all NVIDIA driver
[623245.850508] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[623245.863736] NVRM: make sure that this kernel module and all NVIDIA driver
[623245.863744] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22

Também recebo este erro ao executar o código abaixo:

mona@pascal:~$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
modprobe: ERROR: could not insert 'nvidia_361_uvm': Invalid argument
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_UNKNOWN
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:153] retrieving CUDA diagnostic information for host: pascal
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: pascal
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: 361.93.2
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:356] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.99  Mon Jul  4 23:52:14 PDT 2016
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) 
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: 352.99.0
E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:296] kernel version 352.99.0 does not match DSO version 361.93.2 -- cannot find working devices in this configuration
I tensorflow/core/common_runtime/gpu/gpu_init.cc:81] No GPU devices available on machine.

Eu também tenho:

mona@pascal:~$ modprobe --resolve-alias nvidia
nvidia_361

mona@pascal:~$ grep -r nvidia /etc/modprobe.d/
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
/etc/modprobe.d/fbdev-blacklist.conf:blacklist nvidiafb
/etc/modprobe.d/nvidia-361_hybrid.conf:# This file was installed by nvidia-361
/etc/modprobe.d/nvidia-352_hybrid.conf:# This file was installed by nvidia-352

mona@pascal:~$ modinfo nvidia-current
modinfo: ERROR: Module nvidia-current not found.

e

mona@pascal:~$ sudo dkms status
bbswitch, 0.7, 3.13.0-62-generic, x86_64: installed
nvidia-361, 361.93.02, 3.13.0-62-generic, x86_64: installed

Além disso, esta é a saída de cat /var/log/apt/history.log

link

Informações adicionais:

mona@pascal:~$ find /lib/modules/$(uname -r) -name '*nvidia*.ko' -ls
28573964 1192 -rw-r--r--   1 root     root      1217712 Sep 29 17:55 /lib/modules/3.13.0-62-generic/updates/dkms/nvidia_361_uvm.ko
28573929  996 -rw-r--r--   1 root     root      1017864 Sep 29 17:55 /lib/modules/3.13.0-62-generic/updates/dkms/nvidia_361_modeset.ko
28573923 13768 -rw-r--r--   1 root     root     14095896 Sep 29 17:55 /lib/modules/3.13.0-62-generic/updates/dkms/nvidia_361.ko
28580212   72 -rw-r--r--   1 root     root        69700 Aug 11  2015 /lib/modules/3.13.0-62-generic/kernel/drivers/video/nvidia/nvidiafb.ko

Qualquer abordagem sistemática na depuração deste problema é realmente apreciada.

    
por Mona Jalal 29.09.2016 / 23:24

0 respostas