Scikit学习中的随机状态(伪随机数) [英] Random state (Pseudo-random number) in Scikit learn

查看:135
本文介绍了Scikit学习中的随机状态(伪随机数)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在scikit learning中实现机器学习算法,但我不理解此参数random_state的作用是什么?我为什么要使用它?

I want to implement a machine learning algorithm in scikit learn, but I don't understand what this parameter random_state does? Why should I use it?

我也无法理解什么是伪随机数.

I also could not understand what is a Pseudo-random number.

推荐答案

train_test_split将数组或矩阵拆分为随机训练和测试子集.这意味着,每次运行时不指定random_state,您都会得到不同的结果,这是预期的行为.例如:

train_test_split splits arrays or matrices into random train and test subsets. That means that everytime you run it without specifying random_state, you will get a different result, this is expected behavior. For example:

运行1:

>>> a, b = np.arange(10).reshape((5, 2)), range(5)
>>> train_test_split(a, b)
[array([[6, 7],
        [8, 9],
        [4, 5]]),
 array([[2, 3],
        [0, 1]]), [3, 4, 2], [1, 0]]

运行2

>>> train_test_split(a, b)
[array([[8, 9],
        [4, 5],
        [0, 1]]),
 array([[6, 7],
        [2, 3]]), [4, 2, 0], [3, 1]]

它改变了.另一方面,如果您使用random_state=some_number,则可以保证 Run 1 的输出将等于 Run 2 的输出,即您的拆分将是总是一样. 实际的random_state数是42,0,21,...无关紧要.重要的是,每次使用42时,第一次进行拆分时总会得到相同的输出. 如果您想要可重复的结果(例如在文档中),这将很有用,这样每个人在运行示例时都可以始终看到相同的数字. 实际上,我会说,在测试材料时,应将random_state设置为某个固定数字,但是如果确实需要随机(而非固定)分割,则在生产中将其删除.

It changes. On the other hand if you use random_state=some_number, then you can guarantee that the output of Run 1 will be equal to the output of Run 2, i.e. your split will be always the same. It doesn't matter what the actual random_state number is 42, 0, 21, ... The important thing is that everytime you use 42, you will always get the same output the first time you make the split. This is useful if you want reproducible results, for example in the documentation, so that everybody can consistently see the same numbers when they run the examples. In practice I would say, you should set the random_state to some fixed number while you test stuff, but then remove it in production if you really need a random (and not a fixed) split.

关于第二个问题,伪随机数生成器是一个生成几乎真正随机数的数字生成器.为什么它们不是真正随机的超出了这个问题的范围,并且可能与您的情况无关,您可以看一下

Regarding your second question, a pseudo-random number generator is a number generator that generates almost truly random numbers. Why they are not truly random is out of the scope of this question and probably won't matter in your case, you can take a look here form more details.

这篇关于Scikit学习中的随机状态(伪随机数)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆