Instalei recentemente um NVIDIA Titan Xp (Galactic Empire Edition) no nosso servidor Dell PowerEdge R730 que usa o Ubuntu 16.04.3 LTS. Eu instalei os drivers seguindo as instruções do link :
sudo apt-get install nvidia-387 nvidia-settings
Também instalei o CUDA seguindo as instruções do link que inclui a desativação do driver nouveau.
Agora, de repente, não podemos mais entrar no servidor por Área de Trabalho Remota (NX NoMachine). Quando eu insiro minhas credenciais na janela de login do Ubuntus, leva alguns segundos, então a tela fica preta primeiro e depois me traz de volta para a tela de login.
Esta é a saída que estou obtendo de ~ / .xsession-errors:
Xlib: extension "GLX" missing on display ":0".
Xlib: extension "GLX" missing on display ":0".
openConnection: connect: No such file or directory
cannot connect to brltty at :0
upstart: gnome-session (Unity) main process (5698) terminated with status 1
upstart: unity-settings-daemon main process (5689) killed by TERM signal
upstart: logrotate main process (5542) killed by TERM signal
upstart: hud main process (5687) killed by TERM signal
upstart: indicator-bluetooth main process (5721) killed by TERM signal
upstart: indicator-power main process (5722) killed by TERM signal
upstart: indicator-datetime main process (5725) killed by TERM signal
upstart: indicator-printers main process (5735) killed by TERM signal
upstart: unity-panel-service main process (5702) killed by TERM signal
upstart: indicator-session main process (5736) killed by TERM signal
upstart: indicator-application main process (5778) killed by TERM signal
upstart: bamfdaemon main process (5669) killed by TERM signal
upstart: unity7 pre-start process (5691) terminated with status 143
upstart: Disconnected from notified D-Bus bus
upstart: indicator-sound main process (5732) killed by TERM signal
E isso estou recebendo de / var / log / syslog:
Dec 18 11:49:03 sauron systemd[1]: Started Session c3 of user username.
Dec 18 11:49:05 sauron systemd[1]: Starting NVIDIA Persistence Daemon...
Dec 18 11:49:05 sauron nvidia-persistenced: Verbose syslog connection opened
Dec 18 11:49:05 sauron nvidia-persistenced: Now running with user ID 124 and group ID 130
Dec 18 11:49:05 sauron nvidia-persistenced: Started (5348)
Dec 18 11:49:05 sauron systemd[1]: Started NVIDIA Persistence Daemon.
Dec 18 11:49:06 sauron nvidia-persistenced: device 0000:82:00.0 - registered
Dec 18 11:49:06 sauron nvidia-persistenced: Local RPC service initialized
Dec 18 11:49:12 sauron org.gtk.vfs.Daemon[2691]: A connection to the bus can't be made
Dec 18 11:49:12 sauron org.gnome.ScreenSaver[2691]: ** Message: Got disconnected from the session message bus; retrying to reconnect every 10 seconds
Dec 18 11:49:12 sauron dbus[2112]: [system] Activating via systemd: service name='org.bluez' unit='dbus-org.bluez.service'
Dec 18 11:49:12 sauron systemd[1]: Started Session c4 of user username.
Dec 18 11:49:13 sauron org.a11y.Bus[5552]: Activating service name='org.a11y.atspi.Registry'
Dec 18 11:49:13 sauron org.a11y.Bus[5552]: ** (process:5646): WARNING **: Failed to register client: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SessionManager was not provided by any .service files
Dec 18 11:49:13 sauron org.a11y.Bus[5552]: Successfully activated service 'org.a11y.atspi.Registry'
Dec 18 11:49:13 sauron org.a11y.atspi.Registry[5655]: SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry
Dec 18 11:49:14 sauron org.ayatana.bamf[5552]: bamfdaemon start/running, process 5669
Dec 18 11:49:15 sauron dbus[2112]: [system] Activating via systemd: service name='org.freedesktop.UPower' unit='upower.service'
Dec 18 11:49:15 sauron systemd[1]: Starting Daemon for power management...
Dec 18 11:49:15 sauron rtkit-daemon[3071]: Successfully made thread 5763 of process 5763 (n/a) owned by '1001' high priority at nice level -11.
Dec 18 11:49:15 sauron rtkit-daemon[3071]: Supervising 3 threads of 2 processes of 2 users.
Dec 18 11:49:15 sauron org.gnome.ScreenSaver[5552]: Xlib: extension "GLX" missing on display ":0".
Dec 18 11:49:15 sauron org.gnome.ScreenSaver[5552]: ** (gnome-screensaver:5756): WARNING **: Couldn't get presence status: The name org.gnome.SessionManager was not provided by any .service files
Dec 18 11:49:15 sauron dbus[2112]: [system] Successfully activated service 'org.freedesktop.UPower'
Dec 18 11:49:15 sauron systemd[1]: Started Daemon for power management.
Dec 18 11:49:15 sauron gnome-session[5698]: Xlib: extension "GLX" missing on display ":0".
Dec 18 11:49:16 sauron gnome-session[5698]: message repeated 3 times: [ Xlib: extension "GLX" missing on display ":0".]
Dec 18 11:49:16 sauron gnome-session[5698]: gnome-session-is-accelerated: No hardware 3D support.
Dec 18 11:49:16 sauron gnome-session[5698]: Xlib: extension "GLX" missing on display ":0".
Dec 18 11:49:16 sauron gnome-session[5698]: gnome-session-check-accelerated: Helper exited with code 256
Dec 18 11:49:16 sauron gnome-session[5698]: gnome-session-binary[5698]: CRITICAL: We failed, but the fail whale is dead. Sorry....
Dec 18 11:49:16 sauron gnome-session-binary[5698]: CRITICAL: We failed, but the fail whale is dead. Sorry....
Dec 18 11:49:16 sauron rtkit-daemon[3071]: Successfully made thread 5816 of process 5816 (n/a) owned by '1001' high priority at nice level -11.
Dec 18 11:49:16 sauron rtkit-daemon[3071]: Supervising 4 threads of 3 processes of 2 users.
Dec 18 11:49:16 sauron pulseaudio[5816]: [pulseaudio] pid.c: Daemon already running.
Dec 18 11:49:16 sauron org.gnome.evolution.dataserver.Sources5[5552]: (evolution-source-registry:5765): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed
Dec 18 11:49:16 sauron org.gnome.evolution.dataserver.Sources5[5552]: (evolution-source-registry:5765): GVFS-WARNING **: Error creating proxy: Error calling StartServiceByName for org.gtk.vfs.Daemon: The connection is closed (g-io-error-quark, 18)
Dec 18 11:49:16 sauron org.gnome.evolution.dataserver.Sources5[5552]: (evolution-source-registry:5765): GLib-GIO-CRITICAL **: Error while sending AddMatch() message: The connection is closed
Dec 18 11:49:16 sauron org.gnome.evolution.dataserver.Sources5[5552]: (evolution-source-registry:5765): GLib-GIO-CRITICAL **: g_dbus_interface_skeleton_unexport: assertion 'interface_->priv->connections != NULL' failed
Dec 18 11:49:16 sauron lightdm[2614]: /etc/modprobe.d is not a file
Dec 18 11:49:16 sauron lightdm[2614]: message repeated 4 times: [ /etc/modprobe.d is not a file]
Dec 18 11:49:16 sauron lightdm[2614]: update-alternatives: error: no alternatives for x86_64-linux-gnu_gfxcore_conf
Dec 18 11:49:16 sauron systemd[1]: Started Session c5 of user lightdm.
Dec 18 11:49:16 sauron pulseaudio[5763]: [pulseaudio] module-alsa-card.c: Failed to open mixer for jack detection
Dec 18 11:49:16 sauron pulseaudio[5763]: [pulseaudio] server-lookup.c: Unable to contact D-Bus: org.freedesktop.DBus.Error.NoServer: Failed to connect to socket /tmp/dbus-BVBIhfFbgN: Connection refused
Dec 18 11:49:16 sauron pulseaudio[5763]: [pulseaudio] main.c: Unable to contact D-Bus: org.freedesktop.DBus.Error.NoServer: Failed to connect to socket /tmp/dbus-BVBIhfFbgN: Connection refused
Dec 18 11:49:16 sauron org.a11y.atspi.Registry[5876]: SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry
Dec 18 11:49:17 sauron dbus[2112]: [system] Activating via systemd: service name='org.freedesktop.ColorManager' unit='colord.service'
Dec 18 11:49:17 sauron systemd[1]: Starting Manage, Install and Generate Color Profiles...
Dec 18 11:49:17 sauron dbus[2112]: [system] Successfully activated service 'org.freedesktop.ColorManager'
Dec 18 11:49:17 sauron systemd[1]: Started Manage, Install and Generate Color Profiles.
Dec 18 11:49:19 sauron systemd[1]: Stopping NVIDIA Persistence Daemon...
Dec 18 11:49:19 sauron nvidia-persistenced: Received signal 15
Dec 18 11:49:19 sauron nvidia-persistenced: Socket closed.
Dec 18 11:49:19 sauron nvidia-persistenced: PID file unlocked.
Dec 18 11:49:19 sauron nvidia-persistenced: PID file closed.
Dec 18 11:49:19 sauron nvidia-persistenced: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced
Dec 18 11:49:19 sauron nvidia-persistenced: Shutdown (5348)
Dec 18 11:49:19 sauron systemd[1]: Stopped NVIDIA Persistence Daemon.
Dec 18 11:49:30 sauron systemd[1]: Started Session c6 of user lightdm.
Dec 18 11:49:34 sauron systemd[1]: Starting NVIDIA Persistence Daemon...
Dec 18 11:49:34 sauron nvidia-persistenced: Verbose syslog connection opened
Dec 18 11:49:34 sauron nvidia-persistenced: Now running with user ID 124 and group ID 130
Dec 18 11:49:34 sauron nvidia-persistenced: Started (6161)
Dec 18 11:49:34 sauron systemd[1]: Started NVIDIA Persistence Daemon.
Dec 18 11:49:35 sauron nvidia-persistenced: device 0000:82:00.0 - registered
Dec 18 11:49:35 sauron nvidia-persistenced: Local RPC service initialized
Dec 18 11:49:37 sauron pulseaudio[5763]: [pulseaudio] bluez5-util.c: GetManagedObjects() failed: org.freedesktop.DBus.Error.TimedOut: Failed to activate service 'org.bluez': timed out
A placa gráfica está funcionando definitivamente. Eu posso ver isso:
lspci -nnk | grep -i "VGA\|'Kern'\|3D\|Display" -A2
0a:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. G200eR2 [102b:0534] (rev 01)
DeviceName: Embedded Video
Subsystem: Dell G200eR2 [1028:0600]
--
82:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [TITAN Xp] [10de:1b02] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:123f]
Kernel driver in use: nvidia
Também nvidia-smi o exibe e eu construí um algoritmo para CUDA que roda bem e super rápido na placa. Então, por que a área de trabalho remota tem problemas com isso?
Muito obrigado antecipadamente!