如何在RandomForestClassifier中选择n_estimators? [英] How to choose n_estimators in RandomForestClassifier?

查看：256 发布时间：2021/4/22 19:08:06 python classification random-forest hyperparameters

本文介绍了如何在RandomForestClassifier中选择n_estimators?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用python预处理数据集上构建一个Random Forest Binary Classsifier，该数据集具有4898个实例，60-40的分层分割比率以及78％的数据属于一个目标标签，而其余的则属于另一个目标标签.我应该选择n_estimators的哪个值以实现最实用/最可能的随机森林分类器模型?我使用下面的代码段绘制了精度vs n_estimators曲线.x_trai和y_train分别是训练集中的特征和目标标签，x_test和y_test分别是测试集中的特征和目标标签.

I'm building a Random Forest Binary Classsifier in python on a pre-processed dataset with 4898 instances, 60-40 stratified split-ratio and 78% data belonging to one target label and the rest to the other. What value of n_estimators should I choose in order to achieve the most practically useful / best possible random forest classifer model? I plotted the accuracy vs n_estimators curve using the code snippet below. x_trai and, y_train are the features and target labels in training set respectively and x_test and y_test are the features and target labels in the test set respectively.

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
scores =[]
for k in range(1, 200):
    rfc = RandomForestClassifier(n_estimators=k)
    rfc.fit(x_train, y_train)
    y_pred = rfc.predict(x_test)
    scores.append(accuracy_score(y_test, y_pred))

import matplotlib.pyplot as plt
%matplotlib inline

# plot the relationship between K and testing accuracy
# plt.plot(x_axis, y_axis)
plt.plot(range(1, 200), scores)
plt.xlabel('Value of n_estimators for Random Forest Classifier')
plt.ylabel('Testing Accuracy')

在这里，可以看出n_estimators的高值将给出良好的准确度得分，但是即使对于n_estimators的附近值，它在曲线中也是随机波动的，因此我无法精确地选择最佳值.我只想了解 n_estimators 超参数的调整，我该如何选择它，请帮忙.我应该使用ROC或CAP曲线代替 accuracy_score 吗?谢谢.

Here, it is visible that a high value for n_estimators will give a good acuracy score, but it is fluctuating randomly in the curve even for nearby values of n_estimators, so I can't pick the best one precisely. I only want to know about the tuning of n_estimators hyperparameter, how should I choose it, please help. Should I use ROC or CAP curve instead of accuracy_score? Thanks.

如何在RandomForestClassifier中选择n_estimators? [英] How to choose n_estimators in RandomForestClassifier?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在RandomForestClassifier中选择n_estimators? [英] How to choose n_estimators in RandomForestClassifier?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭