Python sklearn RandomForestClassifier 不可重现的结果 [英] Python sklearn RandomForestClassifier non-reproducible results

查看:36
本文介绍了Python sklearn RandomForestClassifier 不可重现的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用 sklearn 的随机森林,并尝试比较了几个模型.然后我注意到随机森林即使使用相同的种子也会给出不同的结果.我尝试了两种方式:random.seed(1234) 以及使用内置的随机森林 random_state = 1234在这两种情况下,我都得到了不可重复的结果.我错过了什么......?

I've been using sklearn's random forest, and I've tried to compare several models. Then I noticed that random-forest is giving different results even with the same seed. I tried it both ways: random.seed(1234) as well as use random forest built-in random_state = 1234 In both cases, I get non-repeatable results. What have I missed...?

# 1
random.seed(1234)
RandomForestClassifier(max_depth=5, max_features=5, criterion='gini', min_samples_leaf = 10)
# or 2
RandomForestClassifier(max_depth=5, max_features=5, criterion='gini', min_samples_leaf = 10, random_state=1234)

有什么想法吗?谢谢!!

Any ideas? Thanks!!

添加我的代码的更完整版本

Adding a more complete version of my code

clf = RandomForestClassifier(max_depth=60, max_features=60, \
                        criterion='entropy', \
                        min_samples_leaf = 3, random_state=seed)
# As describe, I tried random_state in several ways, still diff results
clf = clf.fit(X_train, y_train)

predicted = clf.predict(X_test)
predicted_prob = clf.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = metrics.roc_curve(np.array(y_test), predicted_prob)
auc = metrics.auc(fpr,tpr)
print (auc)

已经有一段时间了,但我认为使用 RandomState 可能会解决这个问题.我自己还没有测试过,但如果你正在阅读它,值得一试.此外,通常最好使用 RandomState 而不是 random.seed().

It's been quite a while, but I think using RandomState might solve the problem. I didn't test it yet myself, but if you're reading it, it's worth a shot. Also, it is generally preferable to use RandomState instead of random.seed().

推荐答案

首先确保您拥有所需模块的最新版本(例如 scipy、numpy 等).你输入random.seed(1234)时,你使用了numpy生成器.

First make sure that you have the latest versions of the needed modules(e.g. scipy, numpy etc). When you type random.seed(1234), you use the numpy generator.

当您在 RandomForestClassifier 中使用 random_state 参数时,有几个选项:intRandomState instance.

When you use random_state parameter inside the RandomForestClassifier, there are several options: int, RandomState instance or None.

来自文档这里:

  • 如果是 int,random_state 是随机数生成器使用的种子;

  • If int, random_state is the seed used by the random number generator;

如果是 RandomState 实例,random_state 是随机数生成器;

If RandomState instance, random_state is the random number generator;

如果没有,则随机数生成器是 np.random 使用的 RandomState 实例.

If None, the random number generator is the RandomState instance used by np.random.

在两种情况下使用相同生成器的方法如下.我在两种情况下都使用相同的 (numpy) 生成器并且我得到了可重复的结果(相同的结果在这两种情况下).

A way to use the same generator in both cases is the following. I use the same (numpy) generator in both cases and I get reproducible results (same results in both cases).

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from numpy import *

X, y = make_classification(n_samples=1000, n_features=4,
                       n_informative=2, n_redundant=0,
                       random_state=0, shuffle=False)

random.seed(1234)
clf = RandomForestClassifier(max_depth=2)
clf.fit(X, y)

clf2 = RandomForestClassifier(max_depth=2, random_state = random.seed(1234))
clf2.fit(X, y)

检查结果是否相同:

all(clf.predict(X) == clf2.predict(X))
#True

<小时>

相同代码运行5次后检查:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from numpy import *

for i in range(5):

    X, y = make_classification(n_samples=1000, n_features=4,
                       n_informative=2, n_redundant=0,
                       random_state=0, shuffle=False)

    random.seed(1234)
    clf = RandomForestClassifier(max_depth=2)
    clf.fit(X, y)

    clf2 = RandomForestClassifier(max_depth=2, random_state = random.seed(1234))
    clf2.fit(X, y)

    print(all(clf.predict(X) == clf2.predict(X)))

结果:

True
True
True
True
True

这篇关于Python sklearn RandomForestClassifier 不可重现的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆