显示sklearn&过度拟合随机森林 [英] show overfitting with sklearn & random forest

查看:93
本文介绍了显示sklearn&过度拟合随机森林的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我按照本教程创建了一个简单的图像分类脚本:

I followed this tutorial to create a simple image classification script:

https://blog.hyperiondev.com/index.php/2019/02/18/machine-learning/

train_data = scipy.io.loadmat('extra_32x32.mat')
# extract the images and labels from the dictionary object
X = train_data['X']
y = train_data['y']

X = X.reshape(X.shape[0]*X.shape[1]*X.shape[2],X.shape[3]).T
y = y.reshape(y.shape[0],)
X, y = shuffle(X, y, random_state=42)
....
clf = RandomForestClassifier()
print(clf)
start_time = time.time()
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
               max_depth=None, max_features='auto', max_leaf_nodes=None,
               min_impurity_split=1e-07, min_samples_leaf=1,
               min_samples_split=2, min_weight_fraction_leaf=0.0,
               n_estimators=10, n_jobs=1, oob_score=False, random_state=None,
               verbose=0, warm_start=False)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf.fit(X_train, y_train)
preds = clf.predict(X_test)

print("Accuracy:", accuracy_score(y_test,preds))

它给我的准确度约为0.7.

It gave me an accuracy of approximately 0.7.

是否存在可视化或显示模型何时/何时过度拟合的地方?我相信可以通过训练模型来证明这一点,直到我们看到训练的准确性正在提高并且验证数据正在减少.但是如何在代码中这样做?

Is there someway to visualize or show where/when/if the model is overfitting? I believe this can be shown by training the model until we see that the accuracy of training is increasing and the validation data is decreasing. But how can I do so in the code?

推荐答案

另一个选择是使用Optuna之类的库,它将为您测试各种超参数,您可以使用上述方法.

Another option is to use a library like Optuna, which will test various hyperparameters for you and you could use the methods mentioned above.

这篇关于显示sklearn&过度拟合随机森林的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆