Keras:使用train_on_batch处理批处理数据时的测试,交叉验证和准确性 [英] Keras: test, cross validation and accuracy while processing batched data with train_on_batch

查看:473
本文介绍了Keras:使用train_on_batch处理批处理数据时的测试,交叉验证和准确性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以指出我要执行以下所有操作的完整示例吗?

Can someone point me to a complete example that does all of the following?

  • 使用train_on_batch()
  • 循环处理批处理(和腌制)的数据
  • 从每批数据中保留数据以进行验证
  • 在处理所有 批次后,保留测试数据以进行准确性评估(请参见下面示例的最后一行).
  • Fits batched (and pickled) data in a loop using train_on_batch()
  • Sets aside data from each batch for validation purposes
  • Sets aside test data for accuracy evaluation after all batches have been processed (see last line of my example below).

我在互联网上发现了很多1-5行代码段,这些段说明了如何调用train_on_batch()fit_generator(),但是到目前为止,没有任何内容清楚地说明了如何在使用时分离并处理验证数据和测试数据train_on_batch().

I'm finding lots of 1 - 5 line code snippets on the internet illustrating how to call train_on_batch() or fit_generator(), but so far nothing that clearly illustrates how to separate out and handle both validation and test data while using train_on_batch().

F.乔莱特(Chollet)的绝佳范例Cifar10_cnn( https://github.com/fchollet/keras /blob/master/examples/cifar10_cnn.py )并未说明我上面列出的所有要点.

F. Chollet's great example Cifar10_cnn (https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py) does not illustrate all of the points I listed above.

您可以说:嘿,处理测试数据是您的问题.请手动进行."美好的!但是我不知道这些例程做得如何好,甚至都不知道是否有必要.它们大多是黑匣子,据我所知,它们负责验证和验证.在引擎盖下自动测试数据.我希望更完整的示例可以消除混乱.

You can say, "Hey handling test data is your problem. Do it manually." Fine! But I don't understand what these routines do well enough to even know if that is necessary. They are mostly black boxes, and for all I know, they handle validation & test data automagically under the hood. My hope is that more complete example would clear up the confusion.

例如,在下面的示例中,我从泡菜文件中迭代读取批处理,如何修改对train_on_batch的调用以处理validation_data?为了评估算法末尾的准确性,如何预留测试数据(test_xtest_y)?

For instance, in the example below where I read batches iteratively from pickle files, how would I modify the call to train_on_batch to handle validation_data? And how do I set aside test data (test_x & test_y) for purposes of evaluating accuracy at the end of the algorithm?

while 1:
    try:
        batch = np.array(pickle.load(fvecs))
        polarities = np.array(pickle.load(fpols)) 

        # Divide a batch of 1000 documents (movie reviews) into:
        # 800 rows of training data, and
        # 200 rows of test (validation?) data
        train_x, val_x, train_y, val_y = train_test_split(batch, polarities, test_size=0.2)

        doc_size = 30
        x_batch = pad_sequences(train_x, maxlen=doc_size)
        y_batch = train_y

        # Fit the model 
        model.train_on_batch(x_batch, y_batch)
        # model.fit(train_x, train_y, validation_data=(val_x, val_y), epochs=2, batch_size=800, verbose=2)

    except EOFError:
        print("EOF detected.")
        break

# Final evaluation of the model
scores = model.evaluate(test_x, test_y, verbose=0)
print("Accuracy: %.2f%%" % (scores[1] * 100))

推荐答案

我无法为您提供完整的示例,但是您可以看到训练模型,而不要对其进行测试.

I can't supply you with a complete example but as you can see here you have both train_on_batch as well as test_on_batch which should suggest that the "train_on_batch" function should only train the model and not test it.

请确保在代码本身,该函数将整个批次用于训练,而没有任何内容用于测试/验证.

Just to be extra sure you can see in the code itself that the function uses the entire batch for training and nothing is used to test/validate.

为方便起见,我引用以下相关代码:

For your convenience I'm quoting the relevant code below:

def train_on_batch(self, x, y,
                   sample_weight=None,
                   class_weight=None):
    """Runs a single gradient update on a single batch of data.
    # Arguments
        x: Numpy array of training data,
            or list of Numpy arrays if the model has multiple inputs.
            If all inputs in the model are named,
            you can also pass a dictionary
            mapping input names to Numpy arrays.
        y: Numpy array of target data,
            or list of Numpy arrays if the model has multiple outputs.
            If all outputs in the model are named,
            you can also pass a dictionary
            mapping output names to Numpy arrays.
        sample_weight: Optional array of the same length as x, containing
            weights to apply to the model's loss for each sample.
            In the case of temporal data, you can pass a 2D array
            with shape (samples, sequence_length),
            to apply a different weight to every timestep of every sample.
            In this case you should make sure to specify
            sample_weight_mode="temporal" in compile().
        class_weight: Optional dictionary mapping
            class indices (integers) to
            a weight (float) to apply to the model's loss for the samples
            from this class during training.
            This can be useful to tell the model to "pay more attention" to
            samples from an under-represented class.
    # Returns
        Scalar training loss
        (if the model has a single output and no metrics)
        or list of scalars (if the model has multiple outputs
        and/or metrics). The attribute `model.metrics_names` will give you
        the display labels for the scalar outputs.
    """
    x, y, sample_weights = self._standardize_user_data(
        x, y,
        sample_weight=sample_weight,
        class_weight=class_weight,
        check_batch_axis=True)
    if self.uses_learning_phase and not isinstance(K.learning_phase(), int):
        ins = x + y + sample_weights + [1.]
    else:
        ins = x + y + sample_weights
    self._make_train_function()
    outputs = self.train_function(ins)
    if len(outputs) == 1:
        return outputs[0]
    return outputs

这篇关于Keras:使用train_on_batch处理批处理数据时的测试,交叉验证和准确性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆