为什么随机种子不会使 Python 中的结果保持不变 [英] Why random seed does not make results constant in Python

查看:57
本文介绍了为什么随机种子不会使 Python 中的结果保持不变的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下代码.我想为相同的随机种子获得相同的结果.我使用相同的随机种子(在本例中为 1)并得到不同的结果.代码如下:

I use the following code. I would like to get the same results for the same random seed. I use the same random seed (1 in this case) and get different results. Here is the code:

import pandas as pd
import numpy as np
from random import seed
# Load scikit's random forest classifier library
from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split
seed(1) ### <-----

file_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data'
dataset2 = pd.read_csv(file_path, header=None, sep=',')

from sklearn import preprocessing
le = preprocessing.LabelEncoder()

#Encoding
y = le.fit_transform(dataset2[60])
dataset2[60] = y
train, test = train_test_split(dataset2, test_size=0.1)
y = train[60] 
y_test = test[60] 
clf = RandomForestClassifier(n_jobs=100, random_state=0)
features = train.columns[0:59] 
clf.fit(train[features], y)

# Apply the Classifier we trained to the test data
y_pred = clf.predict(test[features])

# Decode 
y_test_label = le.inverse_transform(y_test)
y_pred_label = le.inverse_transform(y_pred)


from sklearn.metrics import accuracy_score
print (accuracy_score(y_test_label, y_pred_label))

# Two following results:
# 0.761904761905
# 0.90476190476

推荐答案

您的代码:

import numpy as np
from random import seed
seed(1) ### <-----

设置 python random-class 的随机种子.

sets the random-seed of python's random-class.

但是sklearn完全基于numpy的随机类,作为 此处解释:

But sklearn is completely based on numpy's random class, as explained here:

对于测试和可复制性,对于具有随机组件的算法中使用的伪随机数生成器,让整个执行由单个种子控制通常很重要.Scikit-learn 不使用自己的全局随机状态;每当没有提供 RandomState 实例或整数随机种子作为参数时,它依赖于 numpy 全局随机状态,可以使用 numpy.random.seed 设置.例如,要将执行的 numpy 全局随机状态设置为 42,可以在他或她的脚本中执行以下操作:

For testing and replicability, it is often important to have the entire execution controlled by a single seed for the pseudo-random number generator used in algorithms that have a randomized component. Scikit-learn does not use its own global random state; whenever a RandomState instance or an integer random seed is not provided as an argument, it relies on the numpy global random state, which can be set using numpy.random.seed. For example, to set an execution’s numpy global random state to 42, one could execute the following in his or her script:

将 numpy 导入为 np

np.random.seed(42)

所以一般来说你应该这样做:

So in general you should do:

np.random.seed(1)

但这只是事实的一部分,因为在小心使用所有 sklearn 组件时通常不需要这样做,用一些种子显式调用它们!

But this is only part of the truth, as often this not needed when being careful with all the sklearn-components in use, explicitly calling them with some seed!

就像 ShreyasG 提到的,这也适用于 train_test_split

Like ShreyasG mentioned, this also applies to train_test_split

这篇关于为什么随机种子不会使 Python 中的结果保持不变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆