计算Theta更新规则的梯度输出 [英] calculate gradient output for Theta update rule

查看:82
本文介绍了计算Theta更新规则的梯度输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因为它使用S型函数而不是零/一激活函数,所以我想这是计算梯度下降的正确方法,对吗?

As this uses a sigmoid function instead of a zero/one activation function I guess this is the right way to calculate gradient descent, is that right?

  static double calculateOutput( int theta, double weights[], double[][] feature_matrix, int file_index, int globo_dict_size )
  {
     //double sum = x * weights[0] + y * weights[1] + z * weights[2] + weights[3];
     double sum = 0.0;

     for (int i = 0; i < globo_dict_size; i++) 
     {
         sum += ( weights[i] * feature_matrix[file_index][i] );
     }
     //bias
     sum += weights[ globo_dict_size ];

     return sigmoid(sum);
  }

  private static double sigmoid(double x)
  {
      return 1 / (1 + Math.exp(-x));
  }

以下我尝试使用的代码更新了我的Θ值(等于感知器中的权重,不是吗?),为此目的我在

This following code where I'm trying up update my Θ values, (equivalent to weights in perceptron, isn't it?), I was given this formula LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i] for that purpose in my related question. I commented out the weight update from my perceptron.

此新的更新规则是否正确?

Is this new update rule the correct approach?

output_gradient是什么意思?那等于我在calculateOutput方法中计算出的总和吗?

What is meant by output_gradient? Is that equivalent to the sum I calculate in my calculateOutput method?

      //LEARNING WEIGHTS
      double localError, globalError;
      int p, iteration, output;

      iteration = 0;
      do 
      {
          iteration++;
          globalError = 0;
          //loop through all instances (complete one epoch)
          for (p = 0; p < number_of_files__train; p++) 
          {
              // calculate predicted class
              output = calculateOutput( theta, weights, feature_matrix__train, p, globo_dict_size );
              // difference between predicted and actual class values
              localError = outputs__train[p] - output;
              //update weights and bias
              for (int i = 0; i < globo_dict_size; i++) 
              {
                  //weights[i] += ( LEARNING_RATE * localError * feature_matrix__train[p][i] );

                  weights[i] += LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]

              }
              weights[ globo_dict_size ] += ( LEARNING_RATE * localError );

              //summation of squared error (error value for all instances)
              globalError += (localError*localError);
          }

          /* Root Mean Squared Error */
          if (iteration < 10) 
              System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
          else
              System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( globalError/number_of_files__train ) );
          //System.out.println( Arrays.toString( weights ) );
      } 
      while(globalError != 0 && iteration<=MAX_ITER);


更新 现在,我已经更新了东西,看起来更像这样:


UPDATE Now I've updated things, looks more like this:

  double loss, cost, hypothesis, gradient;
  int p, iteration;

  iteration = 0;
  do 
  {
    iteration++;
    cost = 0.0;
    loss = 0.0;

    //loop through all instances (complete one epoch)
    for (p = 0; p < number_of_files__train; p++) 
    {

      // 1. Calculate the hypothesis h = X * theta
      hypothesis = calculateHypothesis( theta, feature_matrix__train, p, globo_dict_size );

      // 2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
      loss = hypothesis - outputs__train[p];

      // 3. Calculate the gradient = X' * loss / m
      gradient = calculateGradent( theta, feature_matrix__train, p, globo_dict_size, loss );

      // 4. Update the parameters theta = theta - alpha * gradient
      for (int i = 0; i < globo_dict_size; i++) 
      {
          theta[i] = theta[i] - (LEARNING_RATE * gradient);
      }

    }

    //summation of squared error (error value for all instances)
    cost += (loss*loss);


  /* Root Mean Squared Error */
  if (iteration < 10) 
      System.out.println("Iteration 0" + iteration + " : RMSE = " + Math.sqrt( cost/number_of_files__train ) );
  else
      System.out.println("Iteration " + iteration + " : RMSE = " + Math.sqrt( cost/number_of_files__train ) );
  //System.out.println( Arrays.toString( weights ) );

  } 
  while(cost != 0 && iteration<=MAX_ITER);


}

static double calculateHypothesis( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size )
{
    double hypothesis = 0.0;

     for (int i = 0; i < globo_dict_size; i++) 
     {
         hypothesis += ( theta[i] * feature_matrix[file_index][i] );
     }
     //bias
     hypothesis += theta[ globo_dict_size ];

     return hypothesis;
}

static double calculateGradent( double theta[], double[][] feature_matrix, int file_index, int globo_dict_size, double loss )
{
    double gradient = 0.0;

     for (int i = 0; i < globo_dict_size; i++) 
     {
         gradient += ( feature_matrix[file_index][i] * loss);
     }

     return gradient;
}

public static double hingeLoss()
{
    // l(y, f(x)) = max(0, 1 − y · f(x))

    return HINGE;
}

推荐答案

您的calculateOutput方法看起来正确.您的下一段代码我并不这么认为:

Your calculateOutput method looks correct. Your next piece of code I don't really think so:

weights[i] += LEARNING_RATE * localError * feature_matrix__train[p][i] * output_gradient[i]

查看您在

让我们尝试在代码中识别这些规则的每个部分.

Let's try to identify each part of these rules in your code.

  1. Theta0 and Theta1:在您的代码中看起来像weights[i];我希望globo_dict_size = 2;

  1. Theta0 andTheta1: looks like weights[i] in your code; I hope globo_dict_size = 2;

alpha:似乎是您的LEARNING_RATE;

1 / m:我在您的更新规则中的任何地方都找不到. m是Andrew Ng的视频中训练实例的数量.就您而言,我认为应该是1 / number_of_files__train;不过,这不是很重要,即使没有它也可以正常工作.

1 / m: I can't find this anywhere in your update rule. m is the number of training instances in Andrew Ng's videos. In your case, it should be 1 / number_of_files__train I think; It's not very important though, things should work well even without it.

总和:使用calculateOutput函数执行此操作,该函数的结果在localError变量中使用,并将其乘以feature_matrix__train[p][i](相当于Ng表示的x(i)) .

The sum: you do this with the calculateOutput function, whose result you make use of in the localError variable, which you multiply by feature_matrix__train[p][i] (equivalent to x(i) in Andrew Ng's notation).

这部分是您的偏导数,也是渐变的一部分!

为什么?因为[h_theta(x(i)) - y(i)]^2相对于Theta0的偏导数等于:

Why? Because the partial derivative of [h_theta(x(i)) - y(i)]^2 with respect to Theta0 is equal to:

2*[h_theta(x(i)) - y(i)] * derivative[h_theta(x(i)) - y(i)]
derivative[h_theta(x(i)) - y(i)] =
derivative[Theta0 * x(i, 1) + Theta1*x(i, 2) - y(i)] =
x(i, 1)

当然,您应该得出全部和.这也是为什么Ng将1 / (2m)用作成本函数的原因,因此2会与我们从推导中得到的2相抵消.

Of course, you should derive the entire sum. This is also why Andrew Ng used 1 / (2m) for the cost function, so the 2 would cancel out with the 2 we get from derivation.

请记住,x(i, 1)或仅x(1)应该包含所有内容.在您的代码中,应确保:

Remember that x(i, 1), or just x(1) should consist of all ones. In your code, you should make sure that:

feature_matrix__train[p][0] == 1

  • 就是这样!我不知道代码中应该包含什么output_gradient[i],您也无法在任何地方定义它.

  • That's it! I don't know what output_gradient[i] is supposed to be in your code, you don't define it anywhere.

    我建议您看看本教程,以获得更好的效果了解您使用的算法.由于您使用了S型函数,因此似乎要进行分类,但是您应该使用其他成本函数.该文档还涉及逻辑回归.

    I suggest you take a look at this tutorial to get a better understanding of the algorithm you have used. Since you use the sigmoid function, it seems like you want to do classification, but then you should use a different cost function. That document deals with logistic regression as well.

    这篇关于计算Theta更新规则的梯度输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆