拆分数据集中的Python随机状态 [英] Python random state in splitting dataset

查看:128
本文介绍了拆分数据集中的Python随机状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的新手.谁能告诉我为什么在拆分训练和测试集时将随机状态设置为零.

I'm kind of new to python. can anyone tell me why we set random state to zero in splitting train and test set.

X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.30, random_state=0)

我见过这样的情况,其中随机状态设置为1!

I have seen situations like this where random state is set to one!

X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.30, random_state=1)

在交叉验证中,这种随机状态还会产生什么后果?

What is the consequence of this random state in cross validation as well?

推荐答案

random_state是0还是1或任何其他整数都没有关系.重要的是,如果要在多次运行的代码中验证您的处理,则应将其设置为相同的值.顺便说一句,我已经看到random_state=42在scikit的许多官方示例以及其他地方都使用了.

It doesn't matter if the random_state is 0 or 1 or any other integer. What matters is that it should be set the same value, if you want to validate your processing over multiple runs of the code. By the way I have seen random_state=42 used in many official examples of scikit as well as elsewhere also.

random_state用于初始化内部随机数生成器,该生成器将根据您的情况决定将数据拆分为训练索引和测试索引.在文档中,指出:

random_state as the name suggests, is used for initializing the internal random number generator, which will decide the splitting of data into train and test indices in your case. In the documentation, it is stated that:

如果random_state为None或np.random,则返回一个随机初始化的RandomState对象.

If random_state is None or np.random, then a randomly-initialized RandomState object is returned.

如果random_state是整数,则将其用作种子新的RandomState对象.

If random_state is an integer, then it is used to seed a new RandomState object.

如果random_state是RandomState对象,则将其通过.

If random_state is a RandomState object, then it is passed through.

这是在多次运行代码时检查和验证数据.将random_state设置为固定值将确保每次运行代码时都生成相同的随机数序列.除非过程中存在其他随机性,否则产生的结果将与往常一样.这有助于验证输出.

This is to check and validate the data when running the code multiple times. Setting random_state a fixed value will guarantee that same sequence of random numbers are generated each time you run the code. And unless there is some other randomness present in the process, the results produced will be same as always. This helps in verifying the output.

这篇关于拆分数据集中的Python随机状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆