如何更新两层多层感知器中的学习率? [英] How to update the learning rate in a two layered multi-layered perceptron?

查看：116 发布时间：2020/5/17 19:26:31 python numpy neural-network deep-learning xor

本文介绍了如何更新两层多层感知器中的学习率?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出异或问题:

X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = xor_output = np.array([[0,1,1,0]]).T

一个简单的

具有
S形激活
均方误差(MSE)作为损失函数/优化准则

two layered Multi-Layered Perceptron (MLP) with
sigmoid activations between them and
Mean Square Error (MSE) as the loss function/optimization criterion

[代码]:

def sigmoid(x): # Returns values that sums to one.
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(sx): # For backpropagation.
    # See https://math.stackexchange.com/a/1225116
    return sx * (1 - sx)

# Cost functions.
def mse(predicted, truth):
    return np.sum(np.square(truth - predicted))

X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = xor_output = np.array([[0,1,1,0]]).T

# Define the shape of the weight vector.
num_data, input_dim = X.shape
# Lets set the dimensions for the intermediate layer.
hidden_dim = 5
# Initialize weights between the input layers and the hidden layer.
W1 = np.random.random((input_dim, hidden_dim))

# Define the shape of the output vector. 
output_dim = len(Y.T)
# Initialize weights between the hidden layers and the output layer.
W2 = np.random.random((hidden_dim, output_dim))

并给出停止标准为固定编号.固定学习率为0.3的时期(通过X和Y进行的迭代次数):

And given the stopping criteria as a fixed no. of epochs (no. of iterations through the X and Y) with a fixed learning rate of 0.3:

# Initialize weigh
num_epochs = 10000
learning_rate = 0.3

当我进行前后传播并在每个时期更新权重时，我应该如何更新权重?

When I run through the forward-backward propagation and update the weights in each epoch, how should I update the weights?

我试图简单地将学习率的乘积与反向传播导数的点积与层输出相加，但是该模型仍然仅在一个方向上更新了权重，导致所有权重降低到接近零.

I tried to simply add the product of the learning rate with the dot product of the backpropagated derivative with the layer outputs but the model still only updated the weights in one direction causing all the weights to degrade to near zero.

for epoch_n in range(num_epochs):
    layer0 = X
    # Forward propagation.

    # Inside the perceptron, Step 2. 
    layer1 = sigmoid(np.dot(layer0, W1))
    layer2 = sigmoid(np.dot(layer1, W2))

    # Back propagation (Y -> layer2)

    # How much did we miss in the predictions?
    layer2_error = mse(layer2, Y)

    #print(layer2_error)
    # In what direction is the target value?
    # Were we really close? If so, don't change too much.
    layer2_delta = layer2_error * sigmoid_derivative(layer2)

    # Back propagation (layer2 -> layer1)
    # How much did each layer1 value contribute to the layer2 error (according to the weights)?
    layer1_error = np.dot(layer2_delta, W2.T)
    layer1_delta = layer1_error * sigmoid_derivative(layer1)

    # update weights
    W2 += - learning_rate * np.dot(layer1.T, layer2_delta)
    W1 += - learning_rate * np.dot(layer0.T, layer1_delta)
    #print(np.dot(layer0.T, layer1_delta))
    #print(epoch_n, list((layer2)))

    # Log the loss value as we proceed through the epochs.
    losses.append(layer2_error.mean())

权重应如何正确更新?

完整代码:

from itertools import chain
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(0)

def sigmoid(x): # Returns values that sums to one.
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(sx):
    # See https://math.stackexchange.com/a/1225116
    return sx * (1 - sx)

# Cost functions.
def mse(predicted, truth):
    return np.sum(np.square(truth - predicted))

X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = xor_output = np.array([[0,1,1,0]]).T

# Define the shape of the weight vector.
num_data, input_dim = X.shape
# Lets set the dimensions for the intermediate layer.
hidden_dim = 5
# Initialize weights between the input layers and the hidden layer.
W1 = np.random.random((input_dim, hidden_dim))

# Define the shape of the output vector. 
output_dim = len(Y.T)
# Initialize weights between the hidden layers and the output layer.
W2 = np.random.random((hidden_dim, output_dim))

# Initialize weigh
num_epochs = 10000
learning_rate = 0.3

losses = []

for epoch_n in range(num_epochs):
    layer0 = X
    # Forward propagation.

    # Inside the perceptron, Step 2. 
    layer1 = sigmoid(np.dot(layer0, W1))
    layer2 = sigmoid(np.dot(layer1, W2))

    # Back propagation (Y -> layer2)

    # How much did we miss in the predictions?
    layer2_error = mse(layer2, Y)

    #print(layer2_error)
    # In what direction is the target value?
    # Were we really close? If so, don't change too much.
    layer2_delta = layer2_error * sigmoid_derivative(layer2)

    # Back propagation (layer2 -> layer1)
    # How much did each layer1 value contribute to the layer2 error (according to the weights)?
    layer1_error = np.dot(layer2_delta, W2.T)
    layer1_delta = layer1_error * sigmoid_derivative(layer1)

    # update weights
    W2 += - learning_rate * np.dot(layer1.T, layer2_delta)
    W1 += - learning_rate * np.dot(layer0.T, layer1_delta)
    #print(np.dot(layer0.T, layer1_delta))
    #print(epoch_n, list((layer2)))

    # Log the loss value as we proceed through the epochs.
    losses.append(layer2_error.mean())

# Visualize the losses
plt.plot(losses)
plt.show()

我在反向传播过程中缺少任何东西吗?

也许我错过了从成本到第二层的导数?

我意识到我错过了从成本到第二层以及添加后的偏导数:

I realized I missed the partial derivative from the cost to the second layer and after adding it:

# Cost functions.
def mse(predicted, truth):
    return 0.5 * np.sum(np.square(predicted - truth)).mean()

def mse_derivative(predicted, truth):
    return predicted - truth

具有跨各个时期的更新的反向传播循环:

With the updated backpropagation loop across epochs:

for epoch_n in range(num_epochs):
    layer0 = X
    # Forward propagation.

    # Inside the perceptron, Step 2. 
    layer1 = sigmoid(np.dot(layer0, W1))
    layer2 = sigmoid(np.dot(layer1, W2))

    # Back propagation (Y -> layer2)

    # How much did we miss in the predictions?
    cost_error = mse(layer2, Y)
    cost_delta = mse_derivative(layer2, Y)

    #print(layer2_error)
    # In what direction is the target value?
    # Were we really close? If so, don't change too much.
    layer2_error = np.dot(cost_delta, cost_error)
    layer2_delta = layer2_error *  sigmoid_derivative(layer2)

    # Back propagation (layer2 -> layer1)
    # How much did each layer1 value contribute to the layer2 error (according to the weights)?
    layer1_error = np.dot(layer2_delta, W2.T)
    layer1_delta = layer1_error * sigmoid_derivative(layer1)

    # update weights
    W2 += - learning_rate * np.dot(layer1.T, layer2_delta)
    W1 += - learning_rate * np.dot(layer0.T, layer1_delta)

这似乎是在训练和学习XOR ...

It seemed to train and learn the XOR...

但是现在问题来了，layer2_error和layer2_delta是否正确计算，即以下代码部分正确吗?

But now the question begets, is the layer2_error and layer2_delta computed correctly, i.e. is the following part of the code correct?

# How much did we miss in the predictions?
cost_error = mse(layer2, Y)
cost_delta = mse_derivative(layer2, Y)

#print(layer2_error)
# In what direction is the target value?
# Were we really close? If so, don't change too much.
layer2_error = np.dot(cost_delta, cost_error)
layer2_delta = layer2_error *  sigmoid_derivative(layer2)

在cost_delta和cost_error上为layer2_error做点积是否正确?还是layer2_error等于cost_delta?

Is it correct to do a dot product on the cost_delta and cost_error for the layer2_error? Or would layer2_error just be equals to cost_delta?

即

# How much did we miss in the predictions?
cost_error = mse(layer2, Y)
cost_delta = mse_derivative(layer2, Y)

#print(layer2_error)
# In what direction is the target value?
# Were we really close? If so, don't change too much.
layer2_error = cost_delta
layer2_delta = layer2_error *  sigmoid_derivative(layer2)

如何更新两层多层感知器中的学习率? [英] How to update the learning rate in a two layered multi-layered perceptron?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何更新两层多层感知器中的学习率? [英] How to update the learning rate in a two layered multi-layered perceptron?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭