J'ai installé Cuda 10.1 et cudnn sur Ubuntu 18.04 et il semble être installé correctement en tant que type nvcc et nvidia-smi, j'obtiens une réponse correcte:
user:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
user:~$ nvidia-smi
Mon Mar 18 14:36:47 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K5200 Off | 00000000:03:00.0 On | Off |
| 26% 39C P8 14W / 150W | 225MiB / 8118MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1538 G /usr/lib/xorg/Xorg 32MiB |
| 0 1583 G /usr/bin/gnome-Shell 5MiB |
| 0 3008 G /usr/lib/xorg/Xorg 100MiB |
| 0 3120 G /usr/bin/gnome-Shell 82MiB |
+-----------------------------------------------------------------------------+
J'ai installé tensorflow en utilisant: user:~$ Sudo pip3 install --upgrade tensorflow-gpu
The directory '/home/amin/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with Sudo, you may want Sudo's -H flag.
The directory '/home/amin/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with Sudo, you may want Sudo's -H flag.
Requirement already up-to-date: tensorflow-gpu in /usr/local/lib/python3.6/dist-packages (1.13.1)
Requirement already satisfied, skipping upgrade: keras-applications>=1.0.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.0.7)
Requirement already satisfied, skipping upgrade: protobuf>=3.6.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (3.6.1)
Requirement already satisfied, skipping upgrade: wheel>=0.26 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.32.3)
Requirement already satisfied, skipping upgrade: absl-py>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.7.0)
Requirement already satisfied, skipping upgrade: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.0.9)
Requirement already satisfied, skipping upgrade: gast>=0.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.2.2)
Requirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.1.0)
Requirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.18.0)
Requirement already satisfied, skipping upgrade: tensorflow-estimator<1.14.0rc0,>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.13.0)
Requirement already satisfied, skipping upgrade: six>=1.10.0 in /usr/lib/python3/dist-packages (from tensorflow-gpu) (1.11.0)
Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in /usr/lib/python3/dist-packages (from tensorflow-gpu) (1.13.3)
Requirement already satisfied, skipping upgrade: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.7.1)
Requirement already satisfied, skipping upgrade: tensorboard<1.14.0,>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.13.1)
Requirement already satisfied, skipping upgrade: h5py in /usr/local/lib/python3.6/dist-packages (from keras-applications>=1.0.6->tensorflow-gpu) (2.9.0)
Requirement already satisfied, skipping upgrade: setuptools in /usr/local/lib/python3.6/dist-packages (from protobuf>=3.6.1->tensorflow-gpu) (40.6.3)
Requirement already satisfied, skipping upgrade: mock>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow-gpu) (2.0.0)
Requirement already satisfied, skipping upgrade: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow-gpu) (0.14.1)
Requirement already satisfied, skipping upgrade: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow-gpu) (3.0.1)
Requirement already satisfied, skipping upgrade: pbr>=0.11 in /usr/local/lib/python3.6/dist-packages (from mock>=2.0.0->tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow-gpu) (5.1.1)
Cependant, lorsque j'essaie d'importer tensorflow, j'obtiens une erreur à propos de libcublas.so.10.0:
user:~$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
Qu'est-ce qui me manque? et comment puis-je résoudre ce problème?
Merci
Amin,
Je reçois la même erreur lorsque j'essaie d'exécuter le didacticiel imagenet à partir du package de modèles tensorflow - https://github.com/tensorflow/models/tree/master/tutorials/image/imagenet
python3 classify_image.py
...
2019-07-21 22:29:58.367858: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-07-21 22:29:58.367982: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-07-21 22:29:58.368112: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-07-21 22:29:58.368234: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-07-21 22:29:58.368369: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-07-21 22:29:58.368498: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-07-21 22:29:58.374333: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
Je pense qu'il y a une incompatibilité de version quelque part et probablement tensorflow, repose toujours sur l'ancienne version des binaires fournis par les bibliothèques cuda. Aller à l'endroit où les binaires sont stockés et créer un lien nommé 10.0 mais cible 10.1 ou la version par défaut de la bibliothèque, semble résoudre le problème pour moi.
# cd /usr/lib/x86_64-linux-gnu
# ln -s libcudart.so.10.1 libcudart.so.10.0
# ln -s libcublas.so libcublas.so.10.0
# ln -s libcufft.so libcufft.so.10.0
# ln -s libcurand.so libcurand.so.10.0
# ln -s libcusolver.so libcusolver.so.10.0
# ln -s libcusparse.so libcusparse.so.10.0
Maintenant, je peux exécuter le didacticiel avec succès
2019-07-24 21:43:21.172908: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-24 21:43:21.174653: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-24 21:43:21.175826: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-24 21:43:21.182305: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-24 21:43:21.183970: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-24 21:43:21.206796: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-24 21:43:21.210685: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-24 21:43:21.212694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-24 21:43:21.213060: I tensorflow/core/platform/cpu_feature_guard.cc:142]
Your CPU supports instructions that this TensorFlow binary was not compiled to use: FMA
2019-07-24 21:43:21.238541: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3214745000 Hz
2019-07-24 21:43:21.240096: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557e2b682ce0 executing computations on platform Host. Devices:
2019-07-24 21:43:21.240162: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2019-07-24 21:43:21.355158: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557e2b652000 executing computations on platform CUDA. Devices:
2019-07-24 21:43:21.355234: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1060 6GB, Compute Capability 6.1
2019-07-24 21:43:21.357074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:01:00.0
2019-07-24 21:43:21.357151: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-24 21:43:21.357207: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-24 21:43:21.357245: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-24 21:43:21.357283: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-24 21:43:21.357321: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-24 21:43:21.357358: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-24 21:43:21.357395: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-24 21:43:21.360449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-24 21:43:21.380616: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-24 21:43:21.385223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 Edge matrix:
2019-07-24 21:43:21.385272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-07-24 21:43:21.385299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-07-24 21:43:21.388647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5250 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-07-24 21:43:32.001598: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-24 21:43:32.532105: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
W0724 21:43:34.981204 140284114071872 deprecation_wrapper.py:119] From classify_image.py:85: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
Le problème est dû à votre version actuelle de cuda qui est 10.1 (comme nous pouvons le voir dans le coin supérieur droit de votre image).
Comme vous pouvez le voir sur le site officiel de TF, la correspondance entre tf et cuda est la suivante: site TF pour le graphique
Version cuDNN CUDA
tensorflow-2.1.0 7.6 10.1
tensorflow-2.0.0 7.4 10.0
tensorflow_gpu-1.14.0 7.4 10.0
tensorflow_gpu-1.13.1 7.4 10.0
Ainsi, vous pouvez soit mettre à niveau votre tf vers 2.1, soit rétrograder votre cuda avec:
conda install cudatoolkit=10.0.130
Ensuite, il déclasserait automatiquement votre cudnn également.
Changer ma version tensorflow a résolu mon problème.
vérifiez ce problème 1https://github.com/tensorflow/tensorflow/issues/26182 )
Les binaires officiels tensorflow-gpu (celui téléchargé par pip ou conda) sont construits avec cuda 9.0, cudnn 7 depuis TF 1.5 et cuda 10.0, cudnn 7 depuis TF 1.13. Ceux-ci sont écrits dans les notes de version. Vous devez utiliser la version correspondante de cuda si vous utilisez les binaires officiels.