制作Keras模型时将数据拆分为训练,测试和评估数据 [英] Splitting data to training, testing and valuation when making Keras model
问题描述
在制作和评估Keras机器学习模型时,我对拆分数据集有些困惑.可以说我有1000行的数据集.
功能= df.iloc [:,:-1]结果= df.iloc [:,-1]
现在我想将此数据分为训练和测试(测试数据的33%,训练数据的67%):
x_train,X_test,y_train,y_test = train_test_split(功能,结果,test_size = 0.33)
我已经在互联网上阅读了将数据拟合到模型中的样子:
history = model.fit(功能,结果,validation_split = 0.2,epochs = 10,batch_size = 50)
因此,我正在将完整的数据(特征和结果)拟合到我的模型中,然后从该数据中,我使用20%的数据进行验证: validation_split = 0.2
.因此,基本上,我的模型将使用80%的数据进行训练,并在20%的数据上进行测试.
所以当我需要评估模型时,混乱就开始了:
分数= model.evaluate(x_test,y_test,batch_size = 50)
这是正确的吗?我的意思是,为什么我应该将数据分为训练和测试,x_train和y_train会去哪里?
您能告诉我创建模型的正确步骤是什么吗?
通常,在训练时间( model.fit
)中,您有两套:一套用于训练集,另一个用于验证/调整/开发集.使用训练集,您可以训练模型,而使用验证集,则需要找到最佳的超参数集.完成后,您可以使用看不见的数据集来测试模型-与 training 或 validation 集不同,该数据集完全隐藏在模型中.>
现在,当您使用时
X_train,X_test,y_train,y_test = train_test_split(功能,结果,test_size = 0.33)
通过此操作,您将功能
和结果
拆分为 33%
的数据,以进行测试, 67%
进行培训.现在,您可以做两件事
- 使用(
X_test
和y_test
作为model.fit(...)
中的验证集.或者, - 将它们用于
模型中的最终预测.预测(...)
因此,如果您选择这些测试集作为验证集(数字1 ),则将执行以下操作:
model.fit(x = X_train,y = y_trian,validation_data =(X_test,y_test),...)
在培训日志中,您将获得验证结果以及培训分数.如果您以后计算 model.evaluate(X_test,y_test)
.验证结果应该是相同的.
现在,如果您选择那些测试集作为最终预测或最终评估集(数字2 ),则需要进行验证重新设置或使用 validation_split
参数,如下所示:
model.fit(x = X_train,y = y_trian,validation_split = 0.2,...)
Keras
API将获取训练数据的 .2
百分比( X_train
和 y_train
),并且用它来验证.最后,对于模型的最终评估,您可以执行以下操作:
y_pred = model.predict(x_test,batch_size = 50)
现在,您可以将 y_test
和 y_pred
与一些相关指标进行比较.
I'm a little confused about splitting the dataset when I'm making and evaluating Keras machine learning models. Lets say that I have dataset of 1000 rows.
features = df.iloc[:,:-1]
results = df.iloc[:,-1]
Now I want to split this data into training and testing (33% of data for testing, 67% for training):
x_train, X_test, y_train, y_test = train_test_split(features, results, test_size=0.33)
I have read on the internet that fitting the data into model should look like this:
history = model.fit(features, results, validation_split = 0.2, epochs = 10, batch_size=50)
So I'm fitting the full data (features and results) to my model, and from that data I'm using 20% of data for validation: validation_split = 0.2
.
So basically, my model will be trained with 80% of data, and tested on 20% of data.
So confusion starts when I need to evaluate the model:
score = model.evaluate(x_test, y_test, batch_size=50)
Is this correct? I mean, why should I split the data into training and testing, where does x_train and y_train go?
Can you please explain to me whats the correct order of steps for creating model?
Generally, in training time (model. fit
), you have two sets: one is for the training set and another is for validation/tuning/development set. With the training set, you train the model, and with the validation set, you need to find the best set of hyper-parameter. And when you're done, you may then test your model with unseen data set - a set that was completely hidden from the model unlike the training or validation set.
Now, when you used
X_train, X_test, y_train, y_test = train_test_split(features, results, test_size=0.33)
By this, you split the features
and results
into 33%
of data for testing, 67%
for training. Now, you can do two things
- use the (
X_test
andy_test
as validation set inmodel.fit(...)
. Or, - use them for final prediction in
model. predict(...)
So, if you choose these test sets as a validation set ( number 1 ), you would do as follows:
model.fit(x=X_train, y=y_trian,
validation_data = (X_test, y_test), ...)
In the training log, you will get the validation results along with the training score. The validation results should be the same if you later compute model.evaluate(X_test, y_test)
.
Now, if you choose those test set as a final prediction or final evaluation set ( number 2 ), then you need to make validation set newly or use the validation_split
argument as follows:
model.fit(x=X_train, y=y_trian,
validation_split = 0.2, ...)
The Keras
API will take the .2
percentage of the training data (X_train
and y_train
) and use it for validation. And lastly, for the final evaluation of your model, you can do as follows:
y_pred = model.predict(x_test, batch_size=50)
Now, you can compare with y_test
and y_pred
with some relevant metrics.
这篇关于制作Keras模型时将数据拆分为训练,测试和评估数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!