web-dev-qa-db-fra.com

Restauration du modèle TensorFlow

J'essaie de restaurer le modèle TensorFlow. J'ai suivi cet exemple: http://nasdag.github.io/blog/2016/01/19/classifying-bees-with-google-tensorflow/

À la fin du code de l'exemple, j'ai ajouté les lignes suivantes:

saver = tf.train.Saver()
save_path = saver.save(sess, "model.ckpt")
print("Model saved in file: %s" % save_path)

Deux fichiers ont été créés: checkpoint et model.ckpt.

Dans un nouveau fichier python (tomas_bees_predict.py), j'ai ce code:

import tensorflow as tf

saver = tf.train.Saver()

with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "model.ckpt")
  print("Model restored.")

Cependant, lorsque j'exécute le code, j'obtiens cette erreur:

Traceback (most recent call last):
  File "tomas_bees_predict.py", line 3, in <module>
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 705, in __init__
raise ValueError("No variables to save")

ValueError: Aucune variable à enregistrer

Existe-t-il un moyen de lire le fichier mode.ckpt et de voir quelles variables sont enregistrées? Ou peut-être que quelqu'un pourra vous aider à enregistrer le modèle et à le restaurer à l'aide de l'exemple décrit ci-dessus? 

EDIT 1:

Je pense avoir essayé d’exécuter le même code afin de recréer la structure du modèle et j’obtenais l’erreur. Je pense que cela pourrait être lié au fait que le code décrit ici n'utilise pas de variables nommées: http://nasdag.github.io/blog/2016/01/19/classifying-bees-with-google -tensorflow/

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

Alors j'ai fait cette expérience. J'ai écrit deux versions du code (avec et sans variables nommées) pour enregistrer le modèle et le code pour restaurer le modèle.

tensor_save_named_vars.py:

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(1, name="v1")
v2 = tf.Variable(2, name="v2")

# Add an op to initialize the variables.
init_op = tf.initialize_all_variables()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print "Model saved in file: ", save_path

tensor_save_not_named_vars.py:

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(1)
v2 = tf.Variable(2)

# Add an op to initialize the variables.
init_op = tf.initialize_all_variables()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print "Model saved in file: ", save_path

tensor_restore.py:

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(0, name="v1")
v2 = tf.Variable(0, name="v2")

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print "Model restored."
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()

Voici ce que je reçois quand j'exécute ce code:

$ python tensor_save_named_vars.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
v1 =  1
v2 =  2
Model saved in file:  /tmp/model.ckpt

$ python tensor_restore.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
Model restored.
v1 =  1
v2 =  2

$ python tensor_save_not_named_vars.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
v1 =  1
v2 =  2
Model saved in file:  /tmp/model.ckpt

$ python tensor_restore.py 
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
W tensorflow/core/common_runtime/executor.cc:1076] 0x7ff953881e40 Compute status: Not found: Tensor name "v2" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
W tensorflow/core/common_runtime/executor.cc:1076] 0x7ff953881e40 Compute status: Not found: Tensor name "v1" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice/tensor_name, save/restore_slice/shape_and_slice)]]
Traceback (most recent call last):
  File "tensor_restore.py", line 14, in <module>
    saver.restore(sess, "/tmp/model.ckpt")
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 891, in restore
    sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 368, in run
    results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 444, in _do_run
    e.code)
tensorflow.python.framework.errors.NotFoundError: Tensor name "v2" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
Caused by op u'save/restore_slice_1', defined at:
  File "tensor_restore.py", line 8, in <module>
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 713, in __init__
    restore_sequentially=restore_sequentially)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 432, in build
    filename_tensor, vars_to_save, restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 191, in _AddRestoreOps
    values = self.restore_op(filename_tensor, vs, preferred_shard)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 106, in restore_op
    preferred_shard=preferred_shard)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/io_ops.py", line 189, in _restore_slice
    preferred_shard, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 271, in _restore_slice
    preferred_shard=preferred_shard, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 664, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1834, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1043, in __init__
    self._traceback = _extract_stack()

Alors peut-être que le code original (voir le lien externe ci-dessus) pourrait être modifié pour ressembler à ceci:

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  weight_var = tf.Variable(initial, name="weight_var")
  return weight_var

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  bias_var = tf.Variable(initial, name="bias_var")
  return bias_var

Mais alors la question que j’ai: est-ce que la restauration des variables weight_var et polar_var est suffisante pour implémenter la prédiction? J'ai fait la formation sur la puissante machine avec GPU et je voudrais copier le modèle sur l'ordinateur moins puissant sans GPU pour effectuer des prédictions.

17
Tomas

Je pense avoir essayé d’exécuter le même code afin de recréer la structure du modèle et j’obtenais l’erreur. Je pense que cela pourrait être lié au fait que le code décrit ici n'utilise pas de variables nommées: http://nasdag.github.io/blog/2016/01/19/classifying-bees-with-google -tensorflow/

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

Alors j'ai fait cette expérience. J'ai écrit deux versions du code (avec et sans variables nommées) pour enregistrer le modèle et le code pour restaurer le modèle.

tensor_save_named_vars.py:

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(1, name="v1")
v2 = tf.Variable(2, name="v2")

# Add an op to initialize the variables.
init_op = tf.initialize_all_variables()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print "Model saved in file: ", save_path

tensor_save_not_named_vars.py:

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(1)
v2 = tf.Variable(2)

# Add an op to initialize the variables.
init_op = tf.initialize_all_variables()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print "Model saved in file: ", save_path

tensor_restore.py:

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(0, name="v1")
v2 = tf.Variable(0, name="v2")

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print "Model restored."
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()

Voici ce que je reçois quand j'exécute ce code:

$ python tensor_save_named_vars.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
v1 =  1
v2 =  2
Model saved in file:  /tmp/model.ckpt

$ python tensor_restore.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
Model restored.
v1 =  1
v2 =  2

$ python tensor_save_not_named_vars.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
v1 =  1
v2 =  2
Model saved in file:  /tmp/model.ckpt

$ python tensor_restore.py 
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
W tensorflow/core/common_runtime/executor.cc:1076] 0x7ff953881e40 Compute status: Not found: Tensor name "v2" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
W tensorflow/core/common_runtime/executor.cc:1076] 0x7ff953881e40 Compute status: Not found: Tensor name "v1" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice/tensor_name, save/restore_slice/shape_and_slice)]]
Traceback (most recent call last):
  File "tensor_restore.py", line 14, in <module>
    saver.restore(sess, "/tmp/model.ckpt")
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 891, in restore
    sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 368, in run
    results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 444, in _do_run
    e.code)
tensorflow.python.framework.errors.NotFoundError: Tensor name "v2" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
Caused by op u'save/restore_slice_1', defined at:
  File "tensor_restore.py", line 8, in <module>
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 713, in __init__
    restore_sequentially=restore_sequentially)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 432, in build
    filename_tensor, vars_to_save, restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 191, in _AddRestoreOps
    values = self.restore_op(filename_tensor, vs, preferred_shard)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 106, in restore_op
    preferred_shard=preferred_shard)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/io_ops.py", line 189, in _restore_slice
    preferred_shard, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 271, in _restore_slice
    preferred_shard=preferred_shard, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 664, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1834, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1043, in __init__
    self._traceback = _extract_stack()

Alors peut-être que le code original (voir le lien externe ci-dessus) pourrait être modifié pour ressembler à ceci:

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  weight_var = tf.Variable(initial, name="weight_var")
  return weight_var

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  bias_var = tf.Variable(initial, name="bias_var")
  return bias_var

Mais alors la question que j’ai: est-ce que la restauration des variables weight_var et polar_var est suffisante pour implémenter la prédiction? J'ai fait la formation sur la puissante machine avec GPU et je voudrais copier le modèle sur l'ordinateur moins puissant sans GPU pour effectuer des prédictions.

0
Tomas

Une question similaire se pose ici: Tensorflow: comment enregistrer/restaurer un modèle? TLDR; vous devez recréer la structure du modèle en utilisant la même séquence de commandes d'API TensorFlow avant d'utiliser l'objet Saver pour restaurer les poids

Ceci est sous-optimal, suivez le numéro Github N ° 696 pour progresser dans la simplification

12
Yaroslav Bulatov

Ce problème devrait être causé par les variantes d'étendue du nom lors de la double création du même réseau.

mettre la commande: 

tf.reset_default_graph ()

avant de créer le réseau

1
Leo

Si un problème de ce type se produit, essayez de redémarrer votre noyau car la variable actuelle remplace le précédent conflit qui les a provoqués. Il indique donc notFoundError et que d'autres problèmes se posent.

J'ai rencontré le même type de problème et le redémarrage du noyau a fonctionné pour moi . (Attention: Essayez de ne pas exécuter votre noyau plusieurs fois car cela peut ruiner votre fichier de modèle en recréant des variables qui écrasent celles qui existent et finissent par modifier les valeurs d'origine. )

1
Mahesh_Tripathi

assurez-vous que la déclaration de tf.train.Saver () est bien avec tf.Session () comme sess

0
Cro