comment implémenter next_batch de tensorflow pour ses propres données

Question

Dans le didacticiel MNIST tensorflow , la fonction mnist.train.next_batch(100) est très pratique. J'essaie maintenant de mettre en place une classification simple moi-même. J'ai mes données d'entraînement dans un tableau numpy. Comment pourrais-je implémenter une fonction similaire pour mes propres données pour me donner le prochain lot?

sess = tf.InteractiveSession() tf.global_variables_initializer().run() Xtr, Ytr = loadData() for it in range(1000): batch_x = Xtr.next_batch(100) batch_y = Ytr.next_batch(100)

edo · Accepted Answer

Le lien que vous avez posté dit: "nous recevons un" lot "de cent points de données aléatoires de notre kit de formation" . Dans mon exemple, j'utilise une fonction globale (pas une méthode comme dans votre exemple), il y aura donc une différence de syntaxe.

Dans ma fonction, vous devrez indiquer le nombre d'échantillons souhaités et le tableau de données.

Voici le code correct qui garantit que les échantillons ont des étiquettes correctes:

import numpy as np def next_batch(num, data, labels): ''' Return a total of `num` random samples and labels. ''' idx = np.arange(0 , len(data)) np.random.shuffle(idx) idx = idx[:num] data_shuffle = [data[ i] for i in idx] labels_shuffle = [labels[ i] for i in idx] return np.asarray(data_shuffle), np.asarray(labels_shuffle) Xtr, Ytr = np.arange(0, 10), np.arange(0, 100).reshape(10, 10) print(Xtr) print(Ytr) Xtr, Ytr = next_batch(5, Xtr, Ytr) print('
5 random samples') print(Xtr) print(Ytr)

Et une démo:

[0 1 2 3 4 5 6 7 8 9] [[ 0 1 2 3 4 5 6 7 8 9] [10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24 25 26 27 28 29] [30 31 32 33 34 35 36 37 38 39] [40 41 42 43 44 45 46 47 48 49] [50 51 52 53 54 55 56 57 58 59] [60 61 62 63 64 65 66 67 68 69] [70 71 72 73 74 75 76 77 78 79] [80 81 82 83 84 85 86 87 88 89] [90 91 92 93 94 95 96 97 98 99]] 5 random samples [9 1 5 6 7] [[90 91 92 93 94 95 96 97 98 99] [10 11 12 13 14 15 16 17 18 19] [50 51 52 53 54 55 56 57 58 59] [60 61 62 63 64 65 66 67 68 69] [70 71 72 73 74 75 76 77 78 79]]

Brother_Mumu · Answer

Afin de mélanger et d'échantillonner chaque mini-lot, il convient également de déterminer si un échantillon a été sélectionné à l'intérieur de l'époque actuelle. Voici une implémentation qui utilise les données de la réponse ci-dessus.

import numpy as np class Dataset: def __init__(self,data): self._index_in_Epoch = 0 self._epochs_completed = 0 self._data = data self._num_examples = data.shape[0] pass @property def data(self): return self._data def next_batch(self,batch_size,shuffle = True): start = self._index_in_Epoch if start == 0 and self._epochs_completed == 0: idx = np.arange(0, self._num_examples) # get all possible indexes np.random.shuffle(idx) # shuffle indexe self._data = self.data[idx] # get list of `num` random samples # go to the next batch if start + batch_size > self._num_examples: self._epochs_completed += 1 rest_num_examples = self._num_examples - start data_rest_part = self.data[start:self._num_examples] idx0 = np.arange(0, self._num_examples) # get all possible indexes np.random.shuffle(idx0) # shuffle indexes self._data = self.data[idx0] # get list of `num` random samples start = 0 self._index_in_Epoch = batch_size - rest_num_examples #avoid the case where the #sample != integar times of batch_size end = self._index_in_Epoch data_new_part = self._data[start:end] return np.concatenate((data_rest_part, data_new_part), axis=0) else: self._index_in_Epoch += batch_size end = self._index_in_Epoch return self._data[start:end] dataset = Dataset(np.arange(0, 10)) for i in range(10): print(dataset.next_batch(5))

la sortie est:

[2 8 6 3 4] [1 5 9 0 7] [1 7 3 0 8] [2 6 5 9 4] [1 0 4 8 3] [7 6 2 9 5] [9 5 4 6 2] [0 1 8 7 3] [9 7 8 1 6] [3 5 2 4 0]

les premier et deuxième (3ème et 4ème, ...) mini-lots correspondent à une époque entière.

Sohaib Anwaar · Answer

La réponse qui est indiquée ci-dessus, j’ai essayé l’algorithme avec cet algorithme. Je n’obtiens pas de résultats; j’ai donc cherché sur kaggle et j’ai vu un algorithme vraiment étonnant qui fonctionnait vraiment bien. Meilleur résultat essayez ceci. Dans l'algorithme ci-dessous ** Variable globale prend l'entrée que vous avez déclarée ci-dessus dans laquelle vous lisez votre jeu de données. **

epochs_completed = 0 index_in_Epoch = 0 num_examples = X_train.shape[0] # for splitting out batches of data def next_batch(batch_size): global X_train global y_train global index_in_Epoch global epochs_completed start = index_in_Epoch index_in_Epoch += batch_size # when all trainig data have been already used, it is reorder randomly if index_in_Epoch > num_examples: # finished Epoch epochs_completed += 1 # shuffle the data perm = np.arange(num_examples) np.random.shuffle(perm) X_train = X_train[perm] y_train = y_train[perm] # start next Epoch start = 0 index_in_Epoch = batch_size assert batch_size <= num_examples end = index_in_Epoch return X_train[start:end], y_train[start:end]

Sergiu I · Answer

J'utilise Anaconda et Jupyter . Dans Jupyter, si vous exécutez ?mnist vous obtenez: File: c:\programdata\anaconda3\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py Docstring: Datasets(train, validation, test)

Dans le dossier datesets, vous trouverez mnist.py qui contient toutes les méthodes, y compris next_batch.

Aakash Saxena · Answer

Si vous ne souhaitez pas obtenir d'erreur de disparité de forme dans votre session tensorflow, utilisez la fonction ci-dessous au lieu de la fonction fournie dans la première solution ci-dessus ( https://stackoverflow.com/a/40995666/7748451 ) -

def next_batch(num, data, labels): ''' Return a total of `num` random samples and labels. ''' idx = np.arange(0 , len(data)) np.random.shuffle(idx) idx = idx[:num] data_shuffle = data[idx] labels_shuffle = labels[idx] labels_shuffle = np.asarray(labels_shuffle.values.reshape(len(labels_shuffle), 1)) return data_shuffle, labels_shuffle