TensorFlow pour la classification binaire

Question

J'essaie d'adapter cet exemple MNIST à la classification binaire.

Mais lorsque je change mon NLABELS de NLABELS=2 à NLABELS=1, la fonction de perte renvoie toujours 0 (et précision 1).

from __future__ import absolute_import from __future__ import division from __future__ import print_function from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf # Import data mnist = input_data.read_data_sets('data', one_hot=True) NLABELS = 2 sess = tf.InteractiveSession() # Create the model x = tf.placeholder(tf.float32, [None, 784], name='x-input') W = tf.Variable(tf.zeros([784, NLABELS]), name='weights') b = tf.Variable(tf.zeros([NLABELS], name='bias')) y = tf.nn.softmax(tf.matmul(x, W) + b) # Add summary ops to collect data _ = tf.histogram_summary('weights', W) _ = tf.histogram_summary('biases', b) _ = tf.histogram_summary('y', y) # Define loss and optimizer y_ = tf.placeholder(tf.float32, [None, NLABELS], name='y-input') # More name scopes will clean up the graph representation with tf.name_scope('cross_entropy'): cross_entropy = -tf.reduce_mean(y_ * tf.log(y)) _ = tf.scalar_summary('cross entropy', cross_entropy) with tf.name_scope('train'): train_step = tf.train.GradientDescentOptimizer(10.).minimize(cross_entropy) with tf.name_scope('test'): correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) _ = tf.scalar_summary('accuracy', accuracy) # Merge all the summaries and write them out to /tmp/mnist_logs merged = tf.merge_all_summaries() writer = tf.train.SummaryWriter('logs', sess.graph_def) tf.initialize_all_variables().run() # Train the model, and feed in test data and record summaries every 10 steps for i in range(1000): if i % 10 == 0: # Record summary data and the accuracy labels = mnist.test.labels[:, 0:NLABELS] feed = {x: mnist.test.images, y_: labels} result = sess.run([merged, accuracy, cross_entropy], feed_dict=feed) summary_str = result[0] acc = result[1] loss = result[2] writer.add_summary(summary_str, i) print('Accuracy at step %s: %s - loss: %f' % (i, acc, loss)) else: batch_xs, batch_ys = mnist.train.next_batch(100) batch_ys = batch_ys[:, 0:NLABELS] feed = {x: batch_xs, y_: batch_ys} sess.run(train_step, feed_dict=feed)

J'ai vérifié les dimensions des deux batch_ys (alimenté dans y) et _y et ce sont deux matrices 1xN lorsque NLABELS=1 donc le problème semble être antérieur à cela. Peut-être quelque chose à voir avec la multiplication matricielle?

En fait, j'ai ce même problème dans un vrai projet, donc toute aide serait appréciée ... Merci!

mrry · Accepted Answer

L'exemple MNIST d'origine utilise un codage à chaud pour représenter les étiquettes dans les données: cela signifie que s'il existe des classes NLABELS = 10 (Comme dans MNIST), la sortie cible est [1 0 0 0 0 0 0 0 0 0] Pour la classe 0, [0 1 0 0 0 0 0 0 0 0] Pour la classe 1, etc. L'opérateur tf.nn.softmax() convertit les logits calculés par tf.matmul(x, W) + b en un distribution de probabilité à travers les différentes classes de sortie, qui est ensuite comparée à la valeur d'entrée pour y_.

Si NLABELS = 1, Cela agit comme s'il n'y avait qu'une seule classe, et l'opération tf.nn.softmax() calculerait une probabilité de 1.0 Pour cette classe, conduisant à une entropie croisée de 0.0, puisque tf.log(1.0) est 0.0 pour tous les exemples.

Il existe (au moins) deux approches pour la classification binaire:

Le plus simple serait de définir NLABELS = 2 Pour les deux classes possibles et de coder vos données d'entraînement en tant que [1 0] Pour l'étiquette 0 et [0 1] Pour l'étiquette 1. Cette réponse propose une façon de procéder.
Vous pouvez conserver les étiquettes sous forme d'entiers 0 Et 1 Et utiliser tf.nn.sparse_softmax_cross_entropy_with_logits() , comme suggéré dans cette réponse .

Troy D · Answer

J'ai cherché de bons exemples de la façon d'implémenter la classification binaire dans TensorFlow d'une manière similaire à la façon dont cela serait fait dans Keras. Je n'en ai pas trouvé, mais après avoir fouillé un peu le code, je pense l'avoir compris. J'ai modifié le problème ici pour implémenter une solution qui utilise sigmoid_cross_entropy_with_logits comme le fait Keras sous le capot.

from __future__ import absolute_import from __future__ import division from __future__ import print_function from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf # Import data mnist = input_data.read_data_sets('data', one_hot=True) NLABELS = 1 sess = tf.InteractiveSession() # Create the model x = tf.placeholder(tf.float32, [None, 784], name='x-input') W = tf.get_variable('weights', [784, NLABELS], initializer=tf.truncated_normal_initializer()) * 0.1 b = tf.Variable(tf.zeros([NLABELS], name='bias')) logits = tf.matmul(x, W) + b # Define loss and optimizer y_ = tf.placeholder(tf.float32, [None, NLABELS], name='y-input') # More name scopes will clean up the graph representation with tf.name_scope('cross_entropy'): #manual calculation : under the hood math, don't use this it will have gradient problems # entropy = tf.multiply(tf.log(tf.sigmoid(logits)), y_) + tf.multiply((1 - y_), tf.log(1 - tf.sigmoid(logits))) # loss = -tf.reduce_mean(entropy, name='loss') entropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_, logits=logits) loss = tf.reduce_mean(entropy, name='loss') with tf.name_scope('train'): # Using Adam instead # train_step = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss) train_step = tf.train.AdamOptimizer(learning_rate=0.002).minimize(loss) with tf.name_scope('test'): preds = tf.cast((logits > 0.5), tf.float32) correct_prediction = tf.equal(preds, y_) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) tf.initialize_all_variables().run() # Train the model, and feed in test data and record summaries every 10 steps for i in range(2000): if i % 100 == 0: # Record summary data and the accuracy labels = mnist.test.labels[:, 0:NLABELS] feed = {x: mnist.test.images, y_: labels} result = sess.run([loss, accuracy], feed_dict=feed) print('Accuracy at step %s: %s - loss: %f' % (i, result[1], result[0])) else: batch_xs, batch_ys = mnist.train.next_batch(100) batch_ys = batch_ys[:, 0:NLABELS] feed = {x: batch_xs, y_: batch_ys} sess.run(train_step, feed_dict=feed)

Formation:

Accuracy at step 0: 0.7373 - loss: 0.758670 Accuracy at step 100: 0.9017 - loss: 0.423321 Accuracy at step 200: 0.9031 - loss: 0.322541 Accuracy at step 300: 0.9085 - loss: 0.255705 Accuracy at step 400: 0.9188 - loss: 0.209892 Accuracy at step 500: 0.9308 - loss: 0.178372 Accuracy at step 600: 0.9453 - loss: 0.155927 Accuracy at step 700: 0.9507 - loss: 0.139031 Accuracy at step 800: 0.9556 - loss: 0.125855 Accuracy at step 900: 0.9607 - loss: 0.115340 Accuracy at step 1000: 0.9633 - loss: 0.106709 Accuracy at step 1100: 0.9667 - loss: 0.099286 Accuracy at step 1200: 0.971 - loss: 0.093048 Accuracy at step 1300: 0.9714 - loss: 0.087915 Accuracy at step 1400: 0.9745 - loss: 0.083300 Accuracy at step 1500: 0.9745 - loss: 0.079019 Accuracy at step 1600: 0.9761 - loss: 0.075164 Accuracy at step 1700: 0.9768 - loss: 0.071803 Accuracy at step 1800: 0.9777 - loss: 0.068825 Accuracy at step 1900: 0.9788 - loss: 0.066270