修改感知器成为梯度下降 [英] modify perceptron to become gradient descent
问题描述
根据 这个 视频感知器和梯度下降算法之间的实质性差异非常小。他们基本上将其指定为:
According to this video the substantive difference between the perceptron and gradient descent algorithms are quite minor. They specified it as essentially:
Perceptron:Δ w i =η(y - ŷ) x i
Perceptron: Δwi = η(y - ŷ)xi
Gradient Descent:Δ w i =η(y - α)x i
Gradient Descent: Δwi = η(y - α)xi
我已经实现了感知器算法的工作版本,但我不明白我需要哪些部分改变把它变成梯度下降。
I've implemented a working version of the perceptron algorithm, but I don't understand what sections I need to change to turn it into gradient descent.
下面是我的感知器代码的承载部分,我想这些是我需要修改的组件。但是哪里?我需要改变什么?我不明白。
Below is the load-bearing portions of my perceptron code, I suppose that these are the components I need to modify. But where? What do I need to change? I don't understand.
这是出于教学原因,我有点想到这一点,但我仍然对渐变感到困惑,请参阅 更新 以下
iteration = 0;
do
{
iteration++;
globalError = 0;
//loop through all instances (complete one epoch)
for (p = 0; p < number_of_files__train; p++)
{
// calculate predicted class
output = calculateOutput( theta, weights, feature_matrix__train, p, globo_dict_size );
// difference between predicted and actual class values
localError = outputs__train[p] - output;
//update weights and bias
for (int i = 0; i < globo_dict_size; i++)
{
weights[i] += ( LEARNING_RATE * localError * feature_matrix__train[p][i] );
}
weights[ globo_dict_size ] += ( LEARNING_RATE * localError );
//summation of squared error (error value for all instances)
globalError += (localError*localError);
}
/* Root Mean Squared Error */
if (iteration < 10)
System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
else
System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
}
while(globalError != 0 && iteration<=MAX_ITER);
这是我的感知器的关键所在:
This is the crux of my perceptron:
static int calculateOutput( int theta, double weights[], double[][] feature_matrix, int file_index, int globo_dict_size )
{
//double sum = x * weights[0] + y * weights[1] + z * weights[2] + weights[3];
double sum = 0;
for (int i = 0; i < globo_dict_size; i++)
{
sum += ( weights[i] * feature_matrix[file_index][i] );
}
//bias
sum += weights[ globo_dict_size ];
return (sum >= theta) ? 1 : 0;
}
我只是替换 caculateOutput
类似这样的方法:
Is it just that I replace that caculateOutput
method with something like this:
public static double [] gradientDescent(final double [] theta_in, final double alpha, final int num_iters, double[][] data )
{
final double m = data.length;
double [] theta = theta_in;
double theta0 = 0;
double theta1 = 0;
for (int i = 0; i < num_iters; i++)
{
final double sum0 = gradientDescentSumScalar0(theta, alpha, data );
final double sum1 = gradientDescentSumScalar1(theta, alpha, data);
theta0 = theta[0] - ( (alpha / m) * sum0 );
theta1 = theta[1] - ( (alpha / m) * sum1 );
theta = new double [] { theta0, theta1 };
}
return theta;
}
UPDATE EDIT
此时我觉得我非常接近。
At this point I think I'm very close.
我理解如何计算假设,我认为我已经正确地完成了这项工作,但是,这段代码仍然存在严重错误。我很确定它与我计算的渐变
有关。当我运行它时,错误波动很大,然后转到无穷大
然后只是 NaaN
。
I understand how to calculate the hypothesis and I think I've done that correctly, but nevertheless, something remains terribly wrong with this code. I'm pretty sure it has something to do with my calculation of the gradient
. When I run it the error fluctuates wildly and then goes to infinity
then just NaaN
.
double cost, error, hypothesis;
double[] gradient;
int p, iteration;
iteration = 0;
do
{
iteration++;
error = 0.0;
cost = 0.0;
//loop through all instances (complete one epoch)
for (p = 0; p < number_of_files__train; p++)
{
// 1. Calculate the hypothesis h = X * theta
hypothesis = calculateHypothesis( theta, feature_matrix__train, p, globo_dict_size );
// 2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
cost = hypothesis - outputs__train[p];
// 3. Calculate the gradient = X' * loss / m
gradient = calculateGradent( theta, feature_matrix__train, p, globo_dict_size, cost, number_of_files__train);
// 4. Update the parameters theta = theta - alpha * gradient
for (int i = 0; i < globo_dict_size; i++)
{
theta[i] = theta[i] - LEARNING_RATE * gradient[i];
}
}
//summation of squared error (error value for all instances)
error += (cost*cost);
/* Root Mean Squared Error */
if (iteration < 10)
System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( error/number_of_files__train ) );
else
System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( error/number_of_files__train ) );
//System.out.println( Arrays.toString( weights ) );
}
while(cost != 0 && iteration<=MAX_ITER);
}
static double calculateHypothesis( double[] theta, double[][] feature_matrix, int file_index, int globo_dict_size )
{
double hypothesis = 0.0;
for (int i = 0; i < globo_dict_size; i++)
{
hypothesis += ( theta[i] * feature_matrix[file_index][i] );
}
//bias
hypothesis += theta[ globo_dict_size ];
return hypothesis;
}
static double[] calculateGradent( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size, double cost, int number_of_files__train)
{
double m = number_of_files__train;
double[] gradient = new double[ globo_dict_size];//one for bias?
for (int i = 0; i < gradient.length; i++)
{
gradient[i] = (1.0/m) * cost * feature_matrix[ file_index ][ i ] ;
}
return gradient;
}
推荐答案
感知器规则只是一个当你有像(sum> = theta)这样的非微分激活函数时,近似于梯度下降? 1:0
。正如他们在视频结尾处所要求的那样,你不能在那里使用渐变,因为这个阈值函数是不可微分的(好吧,它的渐变没有为x = 0定义,渐变在其他地方都是零)。如果您有一个像 sigmoid 这样的平滑函数,而不是这个阈值,你可以计算实际的渐变。
The perceptron rule is just an approximation to the gradient descent when you have non-differentiable activation functions like (sum >= theta) ? 1 : 0
. As they ask in the end of the video, you cannot use gradients there because this threshold function isn't differentiable (well, its gradient is not defined for x=0 and the gradient is zero everywhere else). If, instead of this thresholding, you had a smooth function like the sigmoid you could calculate the actual gradients.
在这种情况下,你的体重更新将是 LEARNING_RATE * localError * feature_matrix__train [p] [i] * output_gradient [i]
。对于sigmoid的情况,我发送给你的链接还显示了如何计算 output_gradient
。
In that case your weight update would be LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]
. For the case of sigmoid, the link I sent you also shows how to calculate the output_gradient
.
总结到从感知器变为渐变下降你需要
In summary to change from perceptrons to gradient descent you have to
- 使用激活函数,其衍生(渐变)不是零
无处不在。 - 应用链规则来定义新的更新规则
这篇关于修改感知器成为梯度下降的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!