web-dev-qa-db-fra.com

Tensorflow plante avec CUBLAS_STATUS_ALLOC_FAILED

J'exécute tensorflow-gpu sur Windows 10 à l'aide d'un simple programme de réseau de neurones MINST. Lorsqu'il essaie de s'exécuter, il rencontre un CUBLAS_STATUS_ALLOC_FAILED Erreur. Une recherche google ne donne rien.

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:0f:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:0f:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
    return fn(*args)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
    status, run_metadata)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 256), m=100, n=256, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_7, Variable/read)]]
         [[Node: Mean/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="Edge_35_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
15
Axiverse

L'emplacement de la propriété "allow_growth" de la configuration de session semble être différent maintenant. C'est expliqué ici: https://www.tensorflow.org/tutorials/using_gp

Donc, actuellement, vous devez le définir comme ceci:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
19
Rafal Zajac

J'ai trouvé que cette solution fonctionnait

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session

config = tf.ConfigProto(
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
    # device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)
7
Space Bear

Sous Windows, tensorflow n'alloue pas actuellement toute la mémoire disponible comme indiqué dans la documentation, mais vous pouvez contourner cette erreur en permettant la croissance dynamique de la mémoire comme suit:

tf.Session(config=tf.ConfigProto(allow_growth=True))
4
Axiverse

Tensorflow 2.0 alpha

Autoriser la croissance de la mémoire GPU peut résoudre ce problème. Pour Tensorflow 2.0 alpha/nightly, vous pouvez essayer deux méthodes pour l'archiver.

1.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth()

2.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4) # adjust this to the % of VRAM you 
                                                   # want to give to tensorflow.

Je vous suggère d'essayer les deux et de voir si cela aide. Source: https://www.tensorflow.org/alpha/guide/using_gp

0
kett