分类结果取决于random_state? [英] Classification results depend on random_state?

查看:128
本文介绍了分类结果取决于random_state?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用scikit-learn(sklearn)实现AdaBoost模型.我的问题类似于另一个问题,但并非完全相同.据我了解,文档用于根据前面的链接随机划分训练和测试集.因此,如果我理解正确,那么我的分类结果就不应依赖于种子,这是正确的吗?我是否应该担心我的分类结果是否取决于random_state变量?

I want to implement a AdaBoost model using scikit-learn (sklearn). My question is similar to another question but it is not totally the same. As far as I understand, the random_state variable described in the documentation is for randomly splitting the training and testing sets, according to the previous link. So if I understand correctly, my classification results should not be dependent on the seeds, is it correct? Should I be worried if my classification results turn out to be dependent on the random_state variable?

推荐答案

您的分类分数取决于random_state.就像@Ujjwal正确说的那样,它用于将数据分为训练和测试测试.不仅如此,scikit-learn中的许多算法都使用random_state来选择特征子集,样本子集并确定初始权重等.

Your classification scores will depend on random_state. As @Ujjwal rightly said, it is used for splitting the data into training and test test. Not just that, a lot of algorithms in scikit-learn use the random_state to select the subset of features, subsets of samples, and determine the initial weights etc.

例如

  • 基于树的估计器将使用random_state随机选择特征和样本(例如DecisionTreeClassifier, RandomForestClassifier).

  • Tree based estimators will use the random_state for random selections of features and samples (like DecisionTreeClassifier, RandomForestClassifier).

在诸如

In clustering estimators like Kmeans, random_state is used to initialize centers of clusters.

SVM将其用于初始概率估计

SVMs use it for initial probability estimation

文档中提到的内容:

如果您的代码依赖于随机数生成器,则不应使用numpy.random.random或numpy.random.normal之类的函数.这种方法可能导致测试中的可重复性问题.相反,应该使用numpy.random.RandomState对象,该对象是根据传递给类或函数的random_state参数构建的.

请阅读以下问题和答案以更好地理解:

Do read the following questions and answers for better understanding:

  • Choosing random_state for sklearn algorithms
  • confused about random_state in decision tree of scikit learn

这篇关于分类结果取决于random_state?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆