如何为scikit-learn播种随机数生成器? [英] How to seed the random number generator for scikit-learn?

查看:190
本文介绍了如何为scikit-learn播种随机数生成器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为一些使用scikit-learn的代码编写单元测试.但是,我的单元测试似乎不确定.

I'm trying to write a unit test for some of my code that uses scikit-learn. However, my unit tests seem to be non-deterministic.

AFAIK,在我的代码中scikit-learn使用任何随机性的唯一地方是在它的LogisticRegression模型和train_test_split中,所以我有以下内容:

AFAIK, the only places in my code where scikit-learn uses any randomness are in its LogisticRegression model and its train_test_split, so I have the following:

RANDOM_SEED = 5
self.lr = LogisticRegression(random_state=RANDOM_SEED)
X_train, X_test, y_train, test_labels = train_test_split(docs, labels, test_size=TEST_SET_PROPORTION, random_state=RANDOM_SEED)

但这似乎不起作用-即使当我通过固定的docs和固定的labels时,固定验证集上的预测概率也因运行而异.

But this doesn't seem to work -- even when I pass a fixed docs and a fixed labels, the prediction probabilities on a fixed validation set vary from run to run.

我还尝试在代码的顶部添加一个numpy.random.seed(RANDOM_SEED)调用,但这似乎也不起作用.

I also tried adding a numpy.random.seed(RANDOM_SEED) call at the top of my code, but that didn't seem to work either.

有什么我想念的吗?有没有一种方法可以在一个地方将种子传递给scikit-learn,以便在scikit-learn的所有调用中都使用种子?

Is there anything I'm missing? Is there a way to pass a seed to scikit-learn in a single place, so that seed is used throughout all of scikit-learn's invocations?

推荐答案

from sklearn import datasets, linear_model
iris = datasets.load_iris()
(X, y) = iris.data, iris.target
RANDOM_SEED = 5
lr = linear_model.LogisticRegression(random_state=RANDOM_SEED)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=RANDOM_SEED)
lr.fit(X_train, y_train)
lr.score(X_test, y_test)

现在多次制作0.93333333333333335.您的操作方式似乎还可以.另一种方法是设置np.random.seed() 或使用文档描述的内容是使用random_state :

produced 0.93333333333333335 several times now. The way you did it seems ok. Another way is to set np.random.seed() or use Sacred for documented randomness. Using random_state is what the docs describe:

如果您的代码依赖于随机数生成器,则永远不要使用numpy.random.randomnumpy.random.normal之类的函数.这种方法可能导致单元测试中的可重复性问题.而是应使用numpy.random.RandomState对象,该对象是根据传递给类或函数的random_state参数构建的.

If your code relies on a random number generator, it should never use functions like numpy.random.random or numpy.random.normal. This approach can lead to repeatability issues in unit tests. Instead, a numpy.random.RandomState object should be used, which is built from a random_state argument passed to the class or function.

这篇关于如何为scikit-learn播种随机数生成器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆