如何更新两层多层感知器中的学习率? [英] How to update the learning rate in a two layered multi-layered perceptron?
问题描述
给出异或问题:
X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = xor_output = np.array([[0,1,1,0]]).T
一个简单的
- 具有 的两层多层感知器(MLP) 它们与之间的
- S形激活
- 均方误差(MSE)作为损失函数/优化准则
- two layered Multi-Layered Perceptron (MLP) with
- sigmoid activations between them and
- Mean Square Error (MSE) as the loss function/optimization criterion
[代码]:
def sigmoid(x): # Returns values that sums to one.
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(sx): # For backpropagation.
# See https://math.stackexchange.com/a/1225116
return sx * (1 - sx)
# Cost functions.
def mse(predicted, truth):
return np.sum(np.square(truth - predicted))
X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = xor_output = np.array([[0,1,1,0]]).T
# Define the shape of the weight vector.
num_data, input_dim = X.shape
# Lets set the dimensions for the intermediate layer.
hidden_dim = 5
# Initialize weights between the input layers and the hidden layer.
W1 = np.random.random((input_dim, hidden_dim))
# Define the shape of the output vector.
output_dim = len(Y.T)
# Initialize weights between the hidden layers and the output layer.
W2 = np.random.random((hidden_dim, output_dim))
并给出停止标准为固定编号.固定学习率为0.3的时期(通过X和Y进行的迭代次数):
And given the stopping criteria as a fixed no. of epochs (no. of iterations through the X and Y) with a fixed learning rate of 0.3:
# Initialize weigh
num_epochs = 10000
learning_rate = 0.3
当我进行前后传播并在每个时期更新权重时,我应该如何更新权重?
When I run through the forward-backward propagation and update the weights in each epoch, how should I update the weights?
我试图简单地将学习率的乘积与反向传播导数的点积与层输出相加,但是该模型仍然仅在一个方向上更新了权重,导致所有权重降低到接近零.
I tried to simply add the product of the learning rate with the dot product of the backpropagated derivative with the layer outputs but the model still only updated the weights in one direction causing all the weights to degrade to near zero.
for epoch_n in range(num_epochs):
layer0 = X
# Forward propagation.
# Inside the perceptron, Step 2.
layer1 = sigmoid(np.dot(layer0, W1))
layer2 = sigmoid(np.dot(layer1, W2))
# Back propagation (Y -> layer2)
# How much did we miss in the predictions?
layer2_error = mse(layer2, Y)
#print(layer2_error)
# In what direction is the target value?
# Were we really close? If so, don't change too much.
layer2_delta = layer2_error * sigmoid_derivative(layer2)
# Back propagation (layer2 -> layer1)
# How much did each layer1 value contribute to the layer2 error (according to the weights)?
layer1_error = np.dot(layer2_delta, W2.T)
layer1_delta = layer1_error * sigmoid_derivative(layer1)
# update weights
W2 += - learning_rate * np.dot(layer1.T, layer2_delta)
W1 += - learning_rate * np.dot(layer0.T, layer1_delta)
#print(np.dot(layer0.T, layer1_delta))
#print(epoch_n, list((layer2)))
# Log the loss value as we proceed through the epochs.
losses.append(layer2_error.mean())
权重应如何正确更新?
完整代码:
from itertools import chain
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(0)
def sigmoid(x): # Returns values that sums to one.
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(sx):
# See https://math.stackexchange.com/a/1225116
return sx * (1 - sx)
# Cost functions.
def mse(predicted, truth):
return np.sum(np.square(truth - predicted))
X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = xor_output = np.array([[0,1,1,0]]).T
# Define the shape of the weight vector.
num_data, input_dim = X.shape
# Lets set the dimensions for the intermediate layer.
hidden_dim = 5
# Initialize weights between the input layers and the hidden layer.
W1 = np.random.random((input_dim, hidden_dim))
# Define the shape of the output vector.
output_dim = len(Y.T)
# Initialize weights between the hidden layers and the output layer.
W2 = np.random.random((hidden_dim, output_dim))
# Initialize weigh
num_epochs = 10000
learning_rate = 0.3
losses = []
for epoch_n in range(num_epochs):
layer0 = X
# Forward propagation.
# Inside the perceptron, Step 2.
layer1 = sigmoid(np.dot(layer0, W1))
layer2 = sigmoid(np.dot(layer1, W2))
# Back propagation (Y -> layer2)
# How much did we miss in the predictions?
layer2_error = mse(layer2, Y)
#print(layer2_error)
# In what direction is the target value?
# Were we really close? If so, don't change too much.
layer2_delta = layer2_error * sigmoid_derivative(layer2)
# Back propagation (layer2 -> layer1)
# How much did each layer1 value contribute to the layer2 error (according to the weights)?
layer1_error = np.dot(layer2_delta, W2.T)
layer1_delta = layer1_error * sigmoid_derivative(layer1)
# update weights
W2 += - learning_rate * np.dot(layer1.T, layer2_delta)
W1 += - learning_rate * np.dot(layer0.T, layer1_delta)
#print(np.dot(layer0.T, layer1_delta))
#print(epoch_n, list((layer2)))
# Log the loss value as we proceed through the epochs.
losses.append(layer2_error.mean())
# Visualize the losses
plt.plot(losses)
plt.show()
我在反向传播过程中缺少任何东西吗?
也许我错过了从成本到第二层的导数?
我意识到我错过了从成本到第二层以及添加后的偏导数:
I realized I missed the partial derivative from the cost to the second layer and after adding it:
# Cost functions.
def mse(predicted, truth):
return 0.5 * np.sum(np.square(predicted - truth)).mean()
def mse_derivative(predicted, truth):
return predicted - truth
具有跨各个时期的更新的反向传播循环:
With the updated backpropagation loop across epochs:
for epoch_n in range(num_epochs):
layer0 = X
# Forward propagation.
# Inside the perceptron, Step 2.
layer1 = sigmoid(np.dot(layer0, W1))
layer2 = sigmoid(np.dot(layer1, W2))
# Back propagation (Y -> layer2)
# How much did we miss in the predictions?
cost_error = mse(layer2, Y)
cost_delta = mse_derivative(layer2, Y)
#print(layer2_error)
# In what direction is the target value?
# Were we really close? If so, don't change too much.
layer2_error = np.dot(cost_delta, cost_error)
layer2_delta = layer2_error * sigmoid_derivative(layer2)
# Back propagation (layer2 -> layer1)
# How much did each layer1 value contribute to the layer2 error (according to the weights)?
layer1_error = np.dot(layer2_delta, W2.T)
layer1_delta = layer1_error * sigmoid_derivative(layer1)
# update weights
W2 += - learning_rate * np.dot(layer1.T, layer2_delta)
W1 += - learning_rate * np.dot(layer0.T, layer1_delta)
这似乎是在训练和学习XOR ...
It seemed to train and learn the XOR...
但是现在问题来了,layer2_error
和layer2_delta
是否正确计算,即以下代码部分正确吗?
But now the question begets, is the layer2_error
and layer2_delta
computed correctly, i.e. is the following part of the code correct?
# How much did we miss in the predictions?
cost_error = mse(layer2, Y)
cost_delta = mse_derivative(layer2, Y)
#print(layer2_error)
# In what direction is the target value?
# Were we really close? If so, don't change too much.
layer2_error = np.dot(cost_delta, cost_error)
layer2_delta = layer2_error * sigmoid_derivative(layer2)
在cost_delta
和cost_error
上为layer2_error
做点积是否正确?还是layer2_error
等于cost_delta
?
Is it correct to do a dot product on the cost_delta
and cost_error
for the layer2_error
? Or would layer2_error
just be equals to cost_delta
?
即
# How much did we miss in the predictions?
cost_error = mse(layer2, Y)
cost_delta = mse_derivative(layer2, Y)
#print(layer2_error)
# In what direction is the target value?
# Were we really close? If so, don't change too much.
layer2_error = cost_delta
layer2_delta = layer2_error * sigmoid_derivative(layer2)
推荐答案
是的,在更新权重时,将残差(cost_error
)与增量值相乘是正确的.
Yes, it is correct to multiply the residuals (cost_error
) with the delta values when we update the weights.
但是,因为cost_error
是标量,所以是否进行点积并不重要.因此,一个简单的乘法就足够了.但是,我们绝对必须乘以成本函数的梯度,因为这是我们开始反向传播的地方(即它是向后传递的入口).
However, it doesn't really matter whether do dot product or not since it cost_error
is a scalar. So, a simple multiplication is enough. But, we definitely have to multiply the gradient of the cost function because that's where we start our backprop (i.e. it's the entry point for backward pass).
此外,可以简化以下功能:
Also, the below function can be simplified:
def mse(predicted, truth):
return 0.5 * np.sum(np.square(predicted - truth)).mean()
为
def mse(predicted, truth):
return 0.5 * np.mean(np.square(predicted - truth))
这篇关于如何更新两层多层感知器中的学习率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!