将随机森林预测作为列添加到测试文件中 [英] add random forest predictions as column into test file

查看:86
本文介绍了将随机森林预测作为列添加到测试文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python pandas(在Jupyter笔记本中)工作,在其中为Titanic数据集创建了一个随机森林模型. https://www.kaggle.com/c/titanic/data

I am working in python pandas (in a Jupyter notebook), where I created a Random Forest model for the Titanic data set. https://www.kaggle.com/c/titanic/data

我读了测试并训练了数据,然后清理它并添加了新列(两个列都相同).

I read in the test and train data, then I clean it and I add new columns (the same columns to both).

在拟合和重新拟合模型并尝试增强效果之后;我决定使用一种模型:

After fitting and re-fitting the model and trying boosts etc; I decide on one model:

 X2 = train_data[['Pclass','Sex','Age','richness']] 
 rfc_model_3 = RandomForestClassifier(n_estimators=200)
 %time cross_val_score(rfc_model_3, X2, Y_target).mean()
 rfc_model_3.fit(X2, Y_target)

然后我预测,如果有人幸免于难

Then I predict, if somebody survived or not

 X_test = test_data[['Pclass','Sex','Age','richness']]
 predictions = rfc_model_3.predict(X_test)
 preds = pd.DataFrame(predictions, columns=['Survived'])

我是否可以将预测作为column添加到测试文件中?

Is there a way for me to add the predictions as a column into the test file?

推荐答案

rfc_model_3 = RandomForestClassifier(n_estimators=200)
rfc_model_3.predict(X_test)

返回y : array of shape = [n_samples](请参阅文档),您应该能够将模型输出直接添加到X_test,而无需创建中间的DataFrame:

returns y : array of shape = [n_samples] (see docs), you should be able to add the model output directly to X_test without creating an intermediate DataFrame:

X_test['survived'] = rfc_model_3.predict(X_test)

如果您仍然想要中间结果,则@EdChum在评论中的建议会很好.

If you want the intermediate result anyway, @EdChum's suggestion in the comments would work fine.

这篇关于将随机森林预测作为列添加到测试文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆