如何获得可再现的结果(Keras,Tensorflow): [英] How to Get Reproducible Results (Keras, Tensorflow):

查看:145
本文介绍了如何获得可再现的结果(Keras,Tensorflow):的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了使结果具有可重复性,我撰写了20多篇文章,并在我的脚本中最多添加了这些功能...但是失败了.

To make the results reproducible I've red more than 20 articles and added to my script maximum of the functions ... but failed.

在我的官方资料中,红色有2种种子-全局种子和可操作种子.也许,解决我的问题的关键是设置操作种子,但是我不知道在哪里应用它.

In the official source I red there are 2 kinds of seeds - global and operational. May be, the key to solving my problem is setting the operational seed, but I don't understand where to apply it.

请帮助我使用tensorflow(版本> 2.0)获得可重现的结果吗?非常感谢.

Would you, please, help me to achieve reproducible results with tensorflow (version > 2.0)? Thank you very much.

from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from keras.optimizers import adam
from sklearn.preprocessing import MinMaxScaler


np.random.seed(7)
import tensorflow as tf
tf.random.set_seed(7) #analogue of set_random_seed(seed_value)
import random
random.seed(7)
tf.random.uniform([1], seed=1)
tf.Graph.as_default #analogue of  tf.get_default_graph().finalize()

rng = tf.random.experimental.Generator.from_seed(1234)
rng.uniform((), 5, 10, tf.int64)  # draw a random scalar (0-D tensor) between 5 and 10

df = pd.read_csv("s54.csv", 
                 delimiter = ';', 
                 decimal=',', 
                 dtype = object).apply(pd.to_numeric).fillna(0)

#data normalization
scaler = MinMaxScaler() 
scaled_values = scaler.fit_transform(df) 
df.loc[:,:] = scaled_values


X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,1:],
                                                    df.iloc[:,:1],
                                                    test_size=0.2,
                                                    random_state=7,
                                                    stratify = df.iloc[:,:1])

model = Sequential()
model.add(Dense(1200, input_dim=len(X_train.columns), activation='relu'))  
model.add(Dense(150, activation='relu'))
model.add(Dense(80, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid')) 

loss="binary_crossentropy"
optimizer=adam(lr=0.01)
metrics=['accuracy']
epochs = 2
batch_size = 32
verbose = 0

model.compile(loss=loss,  
              optimizer=optimizer, 
              metrics=metrics) 
model.fit(X_train, y_train, epochs = epochs, batch_size=batch_size, verbose = verbose)
predictions = model.predict(X_test)
tn, fp, fn, tp = confusion_matrix(y_test, predictions>.5).ravel()

推荐答案

作为文档的参考
依赖随机种子的操作实际上是从两个种子派生的:全局种子和操作级别种子.这将设置全局种子.

As a reference from the documentation
Operations that rely on a random seed actually derive it from two seeds: the global and operation-level seeds. This sets the global seed.

它与操作级种子的交互如下:

Its interactions with operation-level seeds are as follows:

  1. 如果既未设置全局种子也未设置操作种子:此操作使用随机选择的种子.
  2. 如果未设置工序种子但设置了全局种子:系统从由全局种子确定的种子流中选择工序种子.
  3. 如果设置了操作种子,但未设置全局种子:使用默认的全局种子和指定的操作种子来确定随机序列.
  4. 如果同时设置了全局种子和操作种子:将两个种子结合使用以确定随机序列.

第一种情况

默认情况下会选择随机种子.结果很容易注意到这一点. 每次您重新运行该程序或多次调用该代码时,它将具有不同的值.

1st Scenario

A random seed will be picked by default. This can be easily noticed with the results. It will have different values every time you re-run the program or call the code multiple times.

x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
print(x_train)

第二种情况

已设置全局,但尚未设置操作. 尽管它产生了与第一和第二随机不同的种子.如果您重新运行或重新启动代码.两者的种子仍将相同. 它们都一遍又一遍地产生相同的结果.

2nd Scenario

The global is set but the operation has not been set. Although it generated a different seed from first and second random. If you re-run or restart the code. The seed for both will still be the same. It both generated the same result over and over again.

tf.random.set_seed(2)
first = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
print(first)
sec = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
print(sec)

第三种情况

在这种情况下,设置了操作种子而不是全局种子. 如果重新运行代码,它将给您不同的结果,但是如果重新启动运行时,则将给您与上次运行相同的结果顺序.

3rd Scenario

For this scenario, where the operation seed is set but not the global. If you re-run the code it will give you different results but if you restart the runtime if will give you the same sequence of results from the previous run.

x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32, seed=2)
print(x_train)

第四种情况

两个种子都将用于确定随机序列. 更改全局种子和操作种子会产生不同的结果,但是使用相同的种子重新启动运行时仍会产生相同的结果.

4th scenario

Both seeds will be used to determine the random sequence. Changing the global and operation seed will give different results but restarting the runtime with the same seed will still give the same results.

tf.random.set_seed(3)
x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32, seed=1)
print(x_train) 

创建了可复制的代码作为参考.
通过设置全局种子,它始终提供相同的结果.

Created a reproducible code as a reference.
By setting the global seed, It always gives the same results.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

## GLOBAL SEED ##                                                   
tf.random.set_seed(3)
x_train = tf.random.normal((10,1), 1, 1, dtype=tf.float32)
y_train = tf.math.sin(x_train)
x_test = tf.random.normal((10,1), 2, 3, dtype=tf.float32)
y_test = tf.math.sin(x_test)

model = Sequential()
model.add(Dense(1200, input_shape=(1,), activation='relu'))  
model.add(Dense(150, activation='relu'))
model.add(Dense(80, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid')) 

loss="binary_crossentropy"
optimizer=tf.keras.optimizers.Adam(lr=0.01)
metrics=['mse']
epochs = 5
batch_size = 32
verbose = 1

model.compile(loss=loss,  
              optimizer=optimizer, 
              metrics=metrics) 
histpry = model.fit(x_train, y_train, epochs = epochs, batch_size=batch_size, verbose = verbose)
predictions = model.predict(x_test)
print(predictions)


注意:如果您使用的是TensorFlow 2更高版本,则Keras已包含在API中,因此,您应该使用TF.Keras而不是本机.
所有这些都在google colab上进行了仿真.


Note: If you are using TensorFlow 2 higher, the Keras is already in the API, therefore, you should use TF.Keras rather than the native one.
All of these are simulated on the google colab.

这篇关于如何获得可再现的结果(Keras,Tensorflow):的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆