Saisir efficacement les dégradés de TensorFlow?

Question

J'essaie d'implémenter un serveur de paramètres asynchrone, style DistBelief utilisant TensorFlow. J'ai trouvé que minimiser () est divisé en deux fonctions, compute_gradients et apply_gradients, donc mon plan est d'insérer une limite de réseau entre elles. J'ai une question sur la façon d'évaluer tous les gradients simultanément et de les retirer tous en même temps. Je comprends que eval évalue uniquement le sous-graphique nécessaire, mais il ne renvoie également qu'un seul tenseur, pas la chaîne de tenseurs requise pour calculer ce tenseur.

Comment puis-je le faire plus efficacement? J'ai pris l'exemple du Deep MNIST comme point de départ:

import tensorflow as tf import download_mnist def weight_variable(shape, name): initial = tf.truncated_normal(shape, stddev=0.1) return tf.Variable(initial, name=name) def bias_variable(shape, name): initial = tf.constant(0.1, shape=shape) return tf.Variable(initial, name=name) def conv2d(x, W): return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') mnist = download_mnist.read_data_sets('MNIST_data', one_hot=True) session = tf.InteractiveSession() x = tf.placeholder("float", shape=[None, 784], name='x') x_image = tf.reshape(x, [-1,28,28,1], name='reshape') y_ = tf.placeholder("float", shape=[None, 10], name='y_') W_conv1 = weight_variable([5, 5, 1, 32], 'W_conv1') b_conv1 = bias_variable([32], 'b_conv1') h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) h_pool1 = max_pool_2x2(h_conv1) W_conv2 = weight_variable([5, 5, 32, 64], 'W_conv2') b_conv2 = bias_variable([64], 'b_conv2') h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) h_pool2 = max_pool_2x2(h_conv2) W_fc1 = weight_variable([7 * 7 * 64, 1024], 'W_fc1') b_fc1 = bias_variable([1024], 'b_fc1') h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) keep_prob = tf.placeholder("float", name='keep_prob') h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) W_fc2 = weight_variable([1024, 10], 'W_fc2') b_fc2 = bias_variable([10], 'b_fc2') y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2) loss = -tf.reduce_sum(y_ * tf.log(y_conv)) optimizer = tf.train.AdamOptimizer(1e-4) correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) compute_gradients = optimizer.compute_gradients(loss) session.run(tf.initialize_all_variables()) batch = mnist.train.next_batch(50) feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5} gradients = [] for grad_var in compute_gradients: grad = grad_var[0].eval(feed_dict=feed_dict) var = grad_var[1] gradients.append((grad, var))

Je pense que cette dernière boucle est en train de recalculer le dernier gradient plusieurs fois, alors que le premier gradient n'est calculé qu'une seule fois? Comment saisir tous les dégradés sans les recalculer?

myme5261314 · Accepted Answer

Donnez-vous simplement un exemple simple. Comprenez-le et essayez votre tâche spécifique.

Initialisez les symboles requis.

x = tf.Variable(0.5) y = x*x opt = tf.train.AdagradOptimizer(0.1) grads = opt.compute_gradients(y) grad_placeholder = [(tf.placeholder("float", shape=grad[1].get_shape()), grad[1] for grad in grads] apply_placeholder_op = opt.apply_gradients(grad_placeholder) transform_grads = [(function1(grad[0]), grad[1]) for grad in grads] apply_transform_op = opt.apply_gradients(transform_grads)

Initialiser

sess = tf.Session() sess.run(tf.initialize_all_variables())

Obtenez tous les dégradés

grad_vals = sess.run([grad[0] for grad in grads])

Appliquer des dégradés

feed_dict = {} for i in xrange(len(grad_placeholder)): feed_dict[grad_placeholder[i][0]] = function2(grad_vals[i]) sess.run(apply_placeholder_op, feed_dict=feed_dict) sess.run(apply_transform_op)

Remarque: le code n'a pas été testé par moi-même, mais je confirme que le code est légal, sauf les erreurs de code mineures. Remarque: function1 et function2 est une sorte de calcul, comme 2 * x, x ^ e ou e ^ x et ainsi de suite.

Voir: TensorFlow apply_gradients à distance

Pinocchio · Answer

J'ai codé un exemple très simple avec des commentaires (inspiré de la réponse ci-dessus) qui est exécutable pour voir la descente du gradient en action:

import tensorflow as tf #funciton to transform gradients def T(g, decay=1.0): #return decayed gradient return decay*g # x variable x = tf.Variable(10.0,name='x') # b placeholder (simualtes the "data" part of the training) b = tf.placeholder(tf.float32) # make model (1/2)(x-b)^2 xx_b = 0.5*tf.pow(x-b,2) y=xx_b learning_rate = 1.0 opt = tf.train.GradientDescentOptimizer(learning_rate) # gradient variable list = [ (gradient,variable) ] gv = opt.compute_gradients(y,[x]) # transformed gradient variable list = [ (T(gradient),variable) ] decay = 0.1 # decay the gradient for the sake of the example tgv = [(T(g,decay=decay),v) for (g,v) in gv] #list [(grad,var)] # apply transformed gradients (this case no transform) apply_transform_op = opt.apply_gradients(tgv) with tf.Session() as sess: sess.run(tf.initialize_all_variables()) epochs = 10 for i in range(epochs): b_val = 1.0 #fake data (in SGD it would be different on every Epoch) print '----' x_before_update = x.eval() print 'before update',x_before_update # compute gradients grad_vals = sess.run([g for (g,v) in gv], feed_dict={b: b_val}) print 'grad_vals: ',grad_vals # applies the gradients result = sess.run(apply_transform_op, feed_dict={b: b_val}) print 'value of x should be: ', x_before_update - T(grad_vals[0], decay=decay) x_after_update = x.eval() print 'after update', x_after_update

vous pouvez observer le changement de la variable au fur et à mesure de son apprentissage ainsi que la valeur du gradient. Notez que la seule raison pour laquelle T désintègre le gradient car sinon il atteint le minimum global en 1 étape.

En prime, si vous voulez le faire fonctionner avec le tensorboard, c'est parti! :)

## run cmd to collect model: python quadratic_minimizer.py --logdir=/tmp/quaratic_temp ## show board on browser run cmd: tensorboard --logdir=/tmp/quaratic_temp ## browser: http://localhost:6006/ import tensorflow as tf #funciton to transform gradients def T(g, decay=1.0): #return decayed gradient return decay*g # x variable x = tf.Variable(10.0,name='x') # b placeholder (simualtes the "data" part of the training) b = tf.placeholder(tf.float32) # make model (1/2)(x-b)^2 xx_b = 0.5*tf.pow(x-b,2) y=xx_b learning_rate = 1.0 opt = tf.train.GradientDescentOptimizer(learning_rate) # gradient variable list = [ (gradient,variable) ] gv = opt.compute_gradients(y,[x]) # transformed gradient variable list = [ (T(gradient),variable) ] decay = 0.9 # decay the gradient for the sake of the example tgv = [ (T(g,decay=decay), v) for (g,v) in gv] #list [(grad,var)] # apply transformed gradients (this case no transform) apply_transform_op = opt.apply_gradients(tgv) (dydx,_) = tgv[0] x_scalar_summary = tf.scalar_summary("x", x) grad_scalar_summary = tf.scalar_summary("dydx", dydx) with tf.Session() as sess: merged = tf.merge_all_summaries() tensorboard_data_dump = '/tmp/quaratic_temp' writer = tf.train.SummaryWriter(tensorboard_data_dump, sess.graph) sess.run(tf.initialize_all_variables()) epochs = 14 for i in range(epochs): b_val = 1.0 #fake data (in SGD it would be different on every Epoch) print '----' x_before_update = x.eval() print 'before update',x_before_update # get gradients #grad_list = [g for (g,v) in gv] (summary_str_grad,grad_val) = sess.run([merged] + [dydx], feed_dict={b: b_val}) grad_vals = sess.run([g for (g,v) in gv], feed_dict={b: b_val}) print 'grad_vals: ',grad_vals writer.add_summary(summary_str_grad, i) # applies the gradients [summary_str_apply_transform,_] = sess.run([merged,apply_transform_op], feed_dict={b: b_val}) writer.add_summary(summary_str_apply_transform, i) print 'value of x after update should be: ', x_before_update - T(grad_vals[0], decay=decay) x_after_update = x.eval() print 'after update', x_after_update