如何确定梯度下降算法中的学习率和方差? [英] How to determine the learning rate and the variance in a gradient descent algorithm?

查看:341
本文介绍了如何确定梯度下降算法中的学习率和方差?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我上周开始学习机器学习.当我想制作一个梯度下降脚本来估计模型参数时,遇到一个问题:如何选择合适的学习率和方差.我发现,不同的(学习率,方差)对可能导致不同的结果,有些甚至你无法融合.另外,如果更改为另一个训练数据集,那么选择正确的(学习率,方差)对可能不起作用.例如(下面的脚本),当我将学习率设置为0.001,方差设置为0.00001时,对于"data1",我可以获得合适的theta0_guess和theta1_guess.但是对于"data2"来说,即使我尝试了数十对(学习率,方差)对仍然无法达到收敛,它们也无法使算法收敛.

I started to learn the machine learning last week. when I want to make a gradient descent script to estimate the model parameters, I came across a problem: How to choose a appropriate learning rate and variance。I found that,different (learning rate,variance) pairs may lead to different results, some times you even can't convergence. Also, if change to another training data set, a well-chose (learning rate,variance)pair probably will not work. For example(script below),when I set the learning rate to 0.001 and variance to 0.00001, for 'data1', I can get the suitable theta0_guess and theta1_guess. But for ‘data2’, they can't make the algorithem convergence, even when I tried dozens of (learning rate,variance)pairs still can't reach to convergence.

因此,如果有人可以告诉我,有一些标准或方法可以确定(学习率,方差)对.

So if anybody could tell me that are there some criteria or methods to determine the (learning rate,variance)pair.

import sys

data1 = [(0.000000,95.364693) ,
    (1.000000,97.217205) ,
    (2.000000,75.195834),
    (3.000000,60.105519) ,
    (4.000000,49.342380),
    (5.000000,37.400286),
    (6.000000,51.057128),
    (7.000000,25.500619),
    (8.000000,5.259608),
    (9.000000,0.639151),
    (10.000000,-9.409936),
    (11.000000, -4.383926),
    (12.000000,-22.858197),
    (13.000000,-37.758333),
    (14.000000,-45.606221)]

data2 = [(2104.,400.),
     (1600.,330.),
     (2400.,369.),
     (1416.,232.),
     (3000.,540.)]

def create_hypothesis(theta1, theta0):
    return lambda x: theta1*x + theta0

def linear_regression(data, learning_rate=0.001, variance=0.00001):


    theta0_guess = 1.
    theta1_guess = 1.


    theta0_last = 100.
    theta1_last = 100.

    m = len(data)

    while (abs(theta1_guess-theta1_last) > variance or abs(theta0_guess - theta0_last) > variance):

        theta1_last = theta1_guess
        theta0_last = theta0_guess

        hypothesis = create_hypothesis(theta1_guess, theta0_guess)

        theta0_guess = theta0_guess - learning_rate * (1./m) * sum([hypothesis(point[0]) - point[1] for point in data])
        theta1_guess = theta1_guess - learning_rate * (1./m) * sum([ (hypothesis(point[0]) - point[1]) * point[0] for point in data])   

    return ( theta0_guess,theta1_guess )



points = [(float(x),float(y)) for (x,y) in data1]

res = linear_regression(points)
print res

推荐答案

绘图是查看算法性能的最佳方法.要查看是否已实现收敛,可以在每次迭代后绘制成本函数的演变,在经过一定的迭代后,您会发现它并没有太大改善,可以假设收敛,请看以下代码:

Plotting is the best way to see how your algorithm is performing. To see if you have achieved convergence you can plot the evolution of the cost function after each iteration, after a certain given of iteration you will see that it does not improve much you can assume convergence, take a look to the following code:

cost_f = []
while (abs(theta1_guess-theta1_last) > variance or abs(theta0_guess - theta0_last) > variance):

    theta1_last = theta1_guess
    theta0_last = theta0_guess

    hypothesis = create_hypothesis(theta1_guess, theta0_guess)
    cost_f.append((1./(2*m))*sum([ pow(hypothesis(point[0]) - point[1], 2) for point in data]))

    theta0_guess = theta0_guess - learning_rate * (1./m) * sum([hypothesis(point[0]) - point[1] for point in data])
    theta1_guess = theta1_guess - learning_rate * (1./m) * sum([ (hypothesis(point[0]) - point[1]) * point[0] for point in data])   

import pylab
pylab.plot(range(len(cost_f)), cost_f)
pylab.show()

将绘制以下图形(以learning_rate = 0.01,方差= 0.00001执行)

Which will plot the following graphic (execution with learning_rate=0.01, variance=0.00001)

如您所见,经过一千次迭代,您并没有太大的进步.通常,如果成本函数在一次迭代中降低到0.001以下,我会声明收敛,但这只是基于我自己的经验.

As you can see, after a thousand iteration you don't get much improvement. I normally declare convergence if the cost function decreases less than 0.001 in one iteration, but this just based on my own experience.

对于选择学习率,您可以做的最好的事情就是绘制成本函数并查看其性能,并始终记住以下两点:

For choosing learning rate, the best thing you can do is also plot the cost function and see how it is performing, and always remember these two things:

  • 如果学习率太小,收敛速度会变慢
  • 如果学习率太大,您的成本函数可能不会在每次迭代中都减小,因此不会收敛

如果运行代码选择learning_rate> 0.029且方差= 0.001,则您将处于第二种情况,梯度下降不会收敛,而如果您选择的值learning_rate< 0.0001,方差= 0.001,您将看到算法需要大量迭代才能收敛.

If you run your code choosing learning_rate > 0.029 and variance=0.001 you will be in the second case, gradient descent doesn't converge, while if you choose values learning_rate < 0.0001, variance=0.001 you will see that your algorithm takes a lot iteration to converge.

不收敛示例,其中learning_rate = 0.03

Not convergence example with learning_rate=0.03

learning_rate = 0.0001的慢速收敛示例

Slow convergence example with learning_rate=0.0001

这篇关于如何确定梯度下降算法中的学习率和方差?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆