制作Keras模型时将数据拆分为训练，测试和评估数据 [英] Splitting data to training, testing and valuation when making Keras model

查看：57 发布时间：2021/4/29 20:50:09 python tensorflow machine-learning keras deep-learning

本文介绍了制作Keras模型时将数据拆分为训练，测试和评估数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在制作和评估Keras机器学习模型时，我对拆分数据集有些困惑.可以说我有1000行的数据集.

 功能= df.iloc [:，:-1]结果= df.iloc [:，-1]

现在我想将此数据分为训练和测试(测试数据的33％，训练数据的67％):

  x_train，X_test，y_train，y_test = train_test_split(功能，结果，test_size = 0.33)

我已经在互联网上阅读了将数据拟合到模型中的样子:

  history = model.fit(功能，结果，validation_split = 0.2，epochs = 10，batch_size = 50)

因此，我正在将完整的数据(特征和结果)拟合到我的模型中，然后从该数据中，我使用20％的数据进行验证: validation_split = 0.2 .因此，基本上，我的模型将使用80％的数据进行训练，并在20％的数据上进行测试.

所以当我需要评估模型时，混乱就开始了:

 分数= model.evaluate(x_test，y_test，batch_size = 50)

这是正确的吗?我的意思是，为什么我应该将数据分为训练和测试，x_train和y_train会去哪里?

您能告诉我创建模型的正确步骤是什么吗?

解决方案

通常，在训练时间( model.fit )中，您有两套:一套用于训练集，另一个用于验证/调整/开发集.使用训练集，您可以训练模型，而使用验证集，则需要找到最佳的超参数集.完成后，您可以使用看不见的数据集来测试模型-与 training 或 validation 集不同，该数据集完全隐藏在模型中.

现在，当您使用时

  X_train，X_test，y_train，y_test = train_test_split(功能，结果，test_size = 0.33)

通过此操作，您将功能和结果拆分为 33％的数据，以进行测试， 67％进行培训.现在，您可以做两件事

使用( X_test 和 y_test 作为 model.fit(...)中的验证集.或者，
将它们用于模型中的最终预测.预测(...)

因此，如果您选择这些测试集作为验证集(数字1 )，则将执行以下操作:

  model.fit(x = X_train，y = y_trian，validation_data =(X_test，y_test)，...)

在培训日志中，您将获得验证结果以及培训分数.如果您以后计算 model.evaluate(X_test，y_test).验证结果应该是相同的.

现在，如果您选择那些测试集作为最终预测或最终评估集(数字2 )，则需要进行验证重新设置或使用 validation_split 参数，如下所示:

  model.fit(x = X_train，y = y_trian，validation_split = 0.2，...)

Keras API将获取训练数据的 .2 百分比( X_train 和 y_train )，并且用它来验证.最后，对于模型的最终评估，您可以执行以下操作:

  y_pred = model.predict(x_test，batch_size = 50)

现在，您可以将 y_test 和 y_pred 与一些相关指标进行比较.

I'm a little confused about splitting the dataset when I'm making and evaluating Keras machine learning models. Lets say that I have dataset of 1000 rows.

features = df.iloc[:,:-1]
results = df.iloc[:,-1]

Now I want to split this data into training and testing (33% of data for testing, 67% for training):

x_train, X_test, y_train, y_test = train_test_split(features, results, test_size=0.33)

I have read on the internet that fitting the data into model should look like this:

history = model.fit(features, results, validation_split = 0.2, epochs = 10, batch_size=50)

So I'm fitting the full data (features and results) to my model, and from that data I'm using 20% of data for validation: validation_split = 0.2. So basically, my model will be trained with 80% of data, and tested on 20% of data.

So confusion starts when I need to evaluate the model:

score = model.evaluate(x_test, y_test, batch_size=50)

Is this correct? I mean, why should I split the data into training and testing, where does x_train and y_train go?

Can you please explain to me whats the correct order of steps for creating model?

解决方案

Generally, in training time (model. fit), you have two sets: one is for the training set and another is for validation/tuning/development set. With the training set, you train the model, and with the validation set, you need to find the best set of hyper-parameter. And when you're done, you may then test your model with unseen data set - a set that was completely hidden from the model unlike the training or validation set.

Now, when you used

X_train, X_test, y_train, y_test = train_test_split(features, results, test_size=0.33)

By this, you split the features and results into 33% of data for testing, 67% for training. Now, you can do two things

use the (X_test and y_test as validation set in model.fit(...). Or,
use them for final prediction in model. predict(...)

So, if you choose these test sets as a validation set ( number 1 ), you would do as follows:

model.fit(x=X_train, y=y_trian, 
         validation_data = (X_test, y_test), ...)

In the training log, you will get the validation results along with the training score. The validation results should be the same if you later compute model.evaluate(X_test, y_test).

Now, if you choose those test set as a final prediction or final evaluation set ( number 2 ), then you need to make validation set newly or use the validation_split argument as follows:

model.fit(x=X_train, y=y_trian, 
         validation_split = 0.2, ...)

The Keras API will take the .2 percentage of the training data (X_train and y_train) and use it for validation. And lastly, for the final evaluation of your model, you can do as follows:

y_pred = model.predict(x_test, batch_size=50)

Now, you can compare with y_test and y_pred with some relevant metrics.

这篇关于制作Keras模型时将数据拆分为训练，测试和评估数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

制作Keras模型时将数据拆分为训练，测试和评估数据 [英] Splitting data to training, testing and valuation when making Keras model

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

制作Keras模型时将数据拆分为训练，测试和评估数据 [英] Splitting data to training, testing and valuation when making Keras model

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭