如何获得可重现的结果(Keras、Tensorflow): [英] How to Get Reproducible Results (Keras, Tensorflow):

查看:42
本文介绍了如何获得可重现的结果(Keras、Tensorflow):的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了使结果可重现,我已将 20 多篇文章添加到我的脚本中,并将最多的功能添加到我的脚本中……但失败了.

To make the results reproducible I've red more than 20 articles and added to my script maximum of the functions ... but failed.

在官方消息中,I red 有 2 种种子 - 全局种子和可操作种子.可能是,解决我的问题的关键是设置操作种子,但我不知道在哪里应用它.

In the official source I red there are 2 kinds of seeds - global and operational. May be, the key to solving my problem is setting the operational seed, but I don't understand where to apply it.

请您帮我用 tensorflow(版本 > 2.0)实现可重现的结果吗?非常感谢.

Would you, please, help me to achieve reproducible results with tensorflow (version > 2.0)? Thank you very much.

from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from keras.optimizers import adam
from sklearn.preprocessing import MinMaxScaler


np.random.seed(7)
import tensorflow as tf
tf.random.set_seed(7) #analogue of set_random_seed(seed_value)
import random
random.seed(7)
tf.random.uniform([1], seed=1)
tf.Graph.as_default #analogue of  tf.get_default_graph().finalize()

rng = tf.random.experimental.Generator.from_seed(1234)
rng.uniform((), 5, 10, tf.int64)  # draw a random scalar (0-D tensor) between 5 and 10

df = pd.read_csv("s54.csv", 
                 delimiter = ';', 
                 decimal=',', 
                 dtype = object).apply(pd.to_numeric).fillna(0)

#data normalization
scaler = MinMaxScaler() 
scaled_values = scaler.fit_transform(df) 
df.loc[:,:] = scaled_values


X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,1:],
                                                    df.iloc[:,:1],
                                                    test_size=0.2,
                                                    random_state=7,
                                                    stratify = df.iloc[:,:1])

model = Sequential()
model.add(Dense(1200, input_dim=len(X_train.columns), activation='relu'))  
model.add(Dense(150, activation='relu'))
model.add(Dense(80, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid')) 

loss="binary_crossentropy"
optimizer=adam(lr=0.01)
metrics=['accuracy']
epochs = 2
batch_size = 32
verbose = 0

model.compile(loss=loss,  
              optimizer=optimizer, 
              metrics=metrics) 
model.fit(X_train, y_train, epochs = epochs, batch_size=batch_size, verbose = verbose)
predictions = model.predict(X_test)
tn, fp, fn, tp = confusion_matrix(y_test, predictions>.5).ravel()

推荐答案

作为文档的参考
依赖于随机种子的操作实际上是从两个种子中派生出来的:全局种子和操作级种子.这将设置全局种子.

As a reference from the documentation
Operations that rely on a random seed actually derive it from two seeds: the global and operation-level seeds. This sets the global seed.

它与操作级种子的交互如下:

Its interactions with operation-level seeds are as follows:

  1. 如果全局种子和操作种子均未设置:此操作使用随机选取的种子.
  2. 如果未设置操作种子但设置了全局种子:系统从全局种子确定的种子流中选择操作种子.
  3. 如果设置了操作种子,但没有设置全局种子:使用默认的全局种子和指定的操作种子来确定随机序列.
  4. 如果同时设置了全局种子和操作种子:两个种子结合使用来确定随机序列.

第一个场景

默认情况下将选择随机种子.通过结果可以很容易地注意到这一点.每次重新运行程序或多次调用代码时,它都会有不同的值.

1st Scenario

A random seed will be picked by default. This can be easily noticed with the results. It will have different values every time you re-run the program or call the code multiple times.

x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
print(x_train)

第二个场景

全局设置了但是操作还没有设置.尽管它从第一个和第二个随机生成了不同的种子.如果重新运行或重新启动代码.两者的种子仍然相同.它一遍又一遍地产生相同的结果.

2nd Scenario

The global is set but the operation has not been set. Although it generated a different seed from first and second random. If you re-run or restart the code. The seed for both will still be the same. It both generated the same result over and over again.

tf.random.set_seed(2)
first = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
print(first)
sec = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
print(sec)

第三种情况

对于这种情况,设置了操作种子而不是全局种子.如果你重新运行代码,它会给你不同的结果,但如果你重新启动运行时,它会给你与上一次运行相同的结果序列.

3rd Scenario

For this scenario, where the operation seed is set but not the global. If you re-run the code it will give you different results but if you restart the runtime if will give you the same sequence of results from the previous run.

x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32, seed=2)
print(x_train)

第四种情况

两个种子都将用于确定随机序列.更改全局和操作种子会产生不同的结果,但使用相同的种子重新启动运行时仍会产生相同的结果.

4th scenario

Both seeds will be used to determine the random sequence. Changing the global and operation seed will give different results but restarting the runtime with the same seed will still give the same results.

tf.random.set_seed(3)
x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32, seed=1)
print(x_train) 

创建了一个可重现的代码作为参考.
通过设置全局种子,它总是给出相同的结果.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

## GLOBAL SEED ##                                                   
tf.random.set_seed(3)
x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
y_train = tf.math.sin(x_train)
x_test = tf.random.normal((10,1), 2, 3, dtype=tf.float32)
y_test = tf.math.sin(x_test)

model = Sequential()
model.add(Dense(1200, input_shape=(1,), activation='relu'))  
model.add(Dense(150, activation='relu'))
model.add(Dense(80, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid')) 

loss="binary_crossentropy"
optimizer=tf.keras.optimizers.Adam(lr=0.01)
metrics=['mse']
epochs = 5
batch_size = 32
verbose = 1

model.compile(loss=loss,  
              optimizer=optimizer, 
              metrics=metrics) 
histpry = model.fit(x_train, y_train, epochs = epochs, batch_size=batch_size, verbose = verbose)
predictions = model.predict(x_test)
print(predictions)


注意:如果您使用更高版本的 TensorFlow 2,Keras 已经在 API 中,因此,您应该使用 TF.Keras 而不是原生的.
所有这些都是在 google colab 上模拟的.


Note: If you are using TensorFlow 2 higher, the Keras is already in the API, therefore, you should use TF.Keras rather than the native one.
All of these are simulated on the google colab.

这篇关于如何获得可重现的结果(Keras、Tensorflow):的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆