分裂数据集中的 scikit-learn 随机状态 [英] scikit-learn random state in splitting dataset

查看:35
本文介绍了分裂数据集中的 scikit-learn 随机状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能告诉我为什么我们在拆分训练集和测试集时将随机状态设置为零.

Can anyone tell me why we set random state to zero in splitting train and test set.

X_train, X_test, y_train, y_test = 
    train_test_split(X, y, test_size=0.30, random_state=0)

我见过随机状态设置为 1 的情况!

I have seen situations like this where random state is set to 1!

X_train, X_test, y_train, y_test = 
    train_test_split(X, y, test_size=0.30, random_state=1)

这种随机状态在交叉验证中的后果是什么?

What is the consequence of this random state in cross validation as well?

推荐答案

random_state 是 0 或 1 或任何其他整数都没有关系.重要的是,如果您想通过多次运行代码来验证您的处理,它应该设置相同的值.顺便说一下,我已经看到 random_state=42 用于许多 scikit 官方示例以及其他地方.

It doesn't matter if the random_state is 0 or 1 or any other integer. What matters is that it should be set the same value, if you want to validate your processing over multiple runs of the code. By the way I have seen random_state=42 used in many official examples of scikit as well as elsewhere also.

random_state 顾名思义,用于初始化内部随机数生成器,在您的情况下,它将决定将数据拆分为训练和测试索引.在文档中,声明:

random_state as the name suggests, is used for initializing the internal random number generator, which will decide the splitting of data into train and test indices in your case. In the documentation, it is stated that:

如果 random_state 为 None 或 np.random,则返回一个随机初始化的 RandomState 对象.

If random_state is None or np.random, then a randomly-initialized RandomState object is returned.

如果 random_state 是一个整数,那么它被用来作为一个新的 RandomState 对象的种子.

If random_state is an integer, then it is used to seed a new RandomState object.

如果 random_state 是一个 RandomState 对象,则它被传递.

If random_state is a RandomState object, then it is passed through.

这是为了在多次运行代码时检查和验证数据.将 random_state 设置为固定值将保证每次运行代码时生成相同的随机数序列.除非过程中存在其他一些随机性,否则产生的结果将与往常一样.这有助于验证输出.

This is to check and validate the data when running the code multiple times. Setting random_state a fixed value will guarantee that same sequence of random numbers are generated each time you run the code. And unless there is some other randomness present in the process, the results produced will be same as always. This helps in verifying the output.

这篇关于分裂数据集中的 scikit-learn 随机状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆