随机梯度下降的批量大小是训练数据的长度而不是 1? [英] Batch size for Stochastic gradient descent is length of training data and not 1?

查看：22 发布时间：2021/12/31 17:06:49 python machine-learning keras neural-network gradient-descent

本文介绍了随机梯度下降的批量大小是训练数据的长度而不是 1?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在使用批量梯度下降、随机梯度下降和小批量随机梯度下降时绘制不同的学习结果.

I am trying to plot the different learning outcome when using Batch gradient descent, Stochastic gradient descent and mini-batch stochastic gradient descent.

无论我在哪里，我都读到 batch_size=1 与普通 SGD 相同，batch_size=len(train_data) 与 Batch 梯度下降相同.

Everywhere i look, i read that a batch_size=1 is the same as having a plain SGD and a batch_size=len(train_data) is the same as having the Batch gradient descent.

我知道随机梯度下降是指每次更新只使用一个数据样本，批量梯度下降使用整个训练数据集来计算目标函数/更新的梯度.

I know that stochastic gradient descent is when you use only one single data sample for every update and batch gradient descent uses the entire training data set to compute the gradient of the objective function / update.

然而，当使用 keras 实现 batch_size 时，情况似乎正好相反.以我的代码为例，我将 batch_size 设置为等于我的 training_data 的长度

However, when implementing the batch_size using keras, it seems to be the opposite that is happening. Take my code for example, where I have set the batch_size equal to the length of my training_data

input_size = len(train_dataset.keys())
output_size = 10
hidden_layer_size = 250
n_epochs = 250

weights_initializer = keras.initializers.GlorotUniform()

#A function that trains and validates the model and returns the MSE
def train_val_model(run_dir, hparams):
    model = keras.models.Sequential([
            #Layer to be used as an entry point into a Network
            keras.layers.InputLayer(input_shape=[len(train_dataset.keys())]),
            #Dense layer 1
            keras.layers.Dense(hidden_layer_size, activation='relu', 
                               kernel_initializer = weights_initializer,
                               name='Layer_1'),
            #Dense layer 2
            keras.layers.Dense(hidden_layer_size, activation='relu', 
                               kernel_initializer = weights_initializer,
                               name='Layer_2'),
            #activation function is linear since we are doing regression
            keras.layers.Dense(output_size, activation='linear', name='Output_layer')
                                ])
    
    #Use the stochastic gradient descent optimizer but change batch_size to get BSG, SGD or MiniSGD
    optimizer = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.0,
                                        nesterov=False)
    
    #Compiling the model
    model.compile(optimizer=optimizer, 
                  loss='mean_squared_error', #Computes the mean of squares of errors between labels and predictions
                  metrics=['mean_squared_error']) #Computes the mean squared error between y_true and y_pred
    
    # initialize TimeStopping callback 
    time_stopping_callback = tfa.callbacks.TimeStopping(seconds=5*60, verbose=1)
    
    #Training the network
    history = model.fit(normed_train_data, train_labels, 
         epochs=n_epochs,
         batch_size=hparams['batch_size'], 
         verbose=1,
         #validation_split=0.2,
         callbacks=[tf.keras.callbacks.TensorBoard(run_dir + "/Keras"), time_stopping_callback])
    
    return history

train_val_model("logs/sample", {'batch_size': len(normed_train_data)})

运行时，输出似乎显示每个时期的单个更新，即 SGD:

When running this, the output seems to show a single update for each epoch i.e. SGD :

正如在每个纪元下面可以看到的那样，它表示 1/1，我认为这意味着单个更新迭代.另一方面，如果我将 batch_size=1 设置为 90000/90000，这是我的整个数据集的大小(在训练时间方面这也是有道理的).

As can be seen underneath every epoch it says 1/1 which I assume means a single update iteration. If I on the other hand set the batch_size=1 I get 90000/90000 which is the size of my entire data-set (training time wise this also makes sense).

所以，我的问题是，batch_size=1 实际上是批量梯度下降而不是随机梯度下降，batch_size=len(train_data) 实际上是随机梯度下降而不是批量梯度下降?

So, my question is, batch_size=1 is actually Batch gradient descent and not stochastic gradient descent and batch_size=len(train_data) is actually stochastic gradient descent and not batch gradient descent?

随机梯度下降的批量大小是训练数据的长度而不是 1? [英] Batch size for Stochastic gradient descent is length of training data and not 1?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

随机梯度下降的批量大小是训练数据的长度而不是 1? [英] Batch size for Stochastic gradient descent is length of training data and not 1?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭