J'exécute tensorflow-gpu sur Windows 10 à l'aide d'un simple programme de réseau de neurones MINST. Lorsqu'il essaie de s'exécuter, il rencontre un CUBLAS_STATUS_ALLOC_FAILED
Erreur. Une recherche google ne donne rien.
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:0f:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:0f:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
return fn(*args)
File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
status, run_metadata)
File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
next(self.gen)
File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 256), m=100, n=256, k=784
[[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_7, Variable/read)]]
[[Node: Mean/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="Edge_35_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
L'emplacement de la propriété "allow_growth" de la configuration de session semble être différent maintenant. C'est expliqué ici: https://www.tensorflow.org/tutorials/using_gp
Donc, actuellement, vous devez le définir comme ceci:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
J'ai trouvé que cette solution fonctionnait
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto(
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
# device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)
Sous Windows, tensorflow n'alloue pas actuellement toute la mémoire disponible comme indiqué dans la documentation, mais vous pouvez contourner cette erreur en permettant la croissance dynamique de la mémoire comme suit:
tf.Session(config=tf.ConfigProto(allow_growth=True))
Tensorflow 2.0 alpha
Autoriser la croissance de la mémoire GPU peut résoudre ce problème. Pour Tensorflow 2.0 alpha/nightly, vous pouvez essayer deux méthodes pour l'archiver.
1.)
import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth()
2.)
import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4) # adjust this to the % of VRAM you
# want to give to tensorflow.
Je vous suggère d'essayer les deux et de voir si cela aide. Source: https://www.tensorflow.org/alpha/guide/using_gp