如何创建一个简单的梯度下降算法 [英] How to create a simple Gradient Descent algorithm

查看:80
本文介绍了如何创建一个简单的梯度下降算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究简单的机器学习算法,从简单的梯度下降开始,但是尝试在python中实现它遇到了一些麻烦.

I'm studying simple machine learning algorithms, beginning with a simple gradient descent, but I've got some trouble trying to implement it in python.

这是我要重现的示例,我获得了有关房屋价格(居住面积(英尺2)和卧室数量)及其价格的数据:

Here is the example I'm trying to reproduce, I've got data about houses with the (living area (in feet2), and number of bedrooms) with the resulting price :

起居区(feet2):2104

Living area (feet2) : 2104

#卧室:3

价格(1000 $ s):400

Price (1000$s) : 400

我正在尝试使用梯度下降法进行简单回归,但是我的算法无法正常工作... 该算法的形式不是故意使用向量(我试图逐步了解它).

I'm trying to do a simple regression using the gradient descent method, but my algorithm won't work... The form of the algorithm is not using vectors on purpose (I'm trying to understand it step by step).

i = 1
import sys
derror=sys.maxint
error = 0
step = 0.0001
dthresh = 0.1
import random

theta1 = random.random()
theta2 = random.random()
theta0 = random.random()
while derror>dthresh:
    diff = 400 - theta0 - 2104 * theta1 - 3 * theta2
    theta0 = theta0 + step * diff * 1
    theta1 = theta1 + step * diff * 2104
    theta2 = theta2 + step * diff * 3
    hserror = diff**2/2
    derror = abs(error - hserror)
    error = hserror
    print 'iteration : %d, error : %s' % (i, error)
    i+=1

我了解数学,正在构建预测函数 与 和 是变量(起居区,卧室数)和 估计价格.

I understand the math, I'm constructing a predicting function with and being the variables (living area, number of bedrooms) and the estimated price.

我正在使用成本函数( )(一分): 这是一个通常的问题,但是我更多是软件工程师,并且一次学习一个步骤,您能告诉我什么地方出问题了吗?

I'm using the cost function ( ) (for one point) : This is a usual problem, but I'm more of a software engineer and I'm learning one step at a time, can you tell me what's wrong ?

我使用此代码:

data = {(2104, 3) : 400, (1600,3) : 330, (2400, 3) : 369, (1416, 2) : 232, (3000, 4) : 540}
for x in range(10):
    i = 1
    import sys
    derror=sys.maxint
    error = 0
    step = 0.00000001
    dthresh = 0.0000000001
    import random

    theta1 = random.random()*100
    theta2 = random.random()*100
    theta0 = random.random()*100
    while derror>dthresh:
        diff = 400 - (theta0 + 2104 * theta1 + 3 * theta2)
        theta0 = theta0 + step * diff * 1
        theta1 = theta1 + step * diff * 2104
        theta2 = theta2 + step * diff * 3
        hserror = diff**2/2
        derror = abs(error - hserror)
        error = hserror
        #print 'iteration : %d, error : %s, derror : %s' % (i, error, derror)
        i+=1
    print ' theta0 : %f, theta1 : %f, theta2 : %f' % (theta0, theta1, theta2)
    print ' done : %f' %(theta0 + 2104 * theta1 + 3*theta2)

最终得到这样的答案:

 theta0 : 48.412337, theta1 : 0.094492, theta2 : 50.925579
 done : 400.000043
 theta0 : 0.574007, theta1 : 0.185363, theta2 : 3.140553
 done : 400.000042
 theta0 : 28.588457, theta1 : 0.041746, theta2 : 94.525769
 done : 400.000043
 theta0 : 42.240593, theta1 : 0.096398, theta2 : 51.645989
 done : 400.000043
 theta0 : 98.452431, theta1 : 0.136432, theta2 : 4.831866
 done : 400.000043
 theta0 : 18.022160, theta1 : 0.148059, theta2 : 23.487524
 done : 400.000043
 theta0 : 39.461977, theta1 : 0.097899, theta2 : 51.519412
 done : 400.000042
 theta0 : 40.979868, theta1 : 0.040312, theta2 : 91.401406
 done : 400.000043
 theta0 : 15.466259, theta1 : 0.111276, theta2 : 50.136221
 done : 400.000043
 theta0 : 72.380926, theta1 : 0.013814, theta2 : 99.517853
 done : 400.000043

推荐答案

第一个问题是,仅使用一个数据运行该程序就会给您带来不确定的系统……这意味着它可能具有无限数量的解决方案.使用三个变量,您应该希望至少有3个数据点,最好高得多.

First issue is that running this with only one piece of data gives you an underdetermined system... this means it may have an infinite number of solutions. With three variables, you'd expect to have at least 3 data points, preferably much higher.

第二步使用梯度下降,步长为梯度的缩放版本,除非在解决方案的较小邻域内,否则不能保证收敛.您可以通过以下方式解决此问题:在负梯度方向上切换到固定大小的步长(缓慢),或在负梯度方向上切换到线搜索(更快,但稍微复杂一点)

Secondly using gradient descent where the step size is a scaled version of the gradient is not guaranteed to converge except in a small neighbourhood of the solution. You can fix that by switching to either a fixed size step in the direction of the negative gradient (slow) or a linesearch in the direction of the negative gradient ( faster, but slightly more complicated)

因此,对于固定步长而不是

So for fixed step size instead of

theta0 = theta0 - step * dEdtheta0
theta1 = theta1 - step * dEdtheta1
theta2 = theta2 - step * dEdtheta2

您这样做

n = max( [ dEdtheta1, dEdtheta1, dEdtheta2 ] )    
theta0 = theta0 - step * dEdtheta0 / n
theta1 = theta1 - step * dEdtheta1 / n
theta2 = theta2 - step * dEdtheta2 / n

看起来您的步骤中可能还有签名错误.

It also looks like you may have a sign error in your steps.

我也不知道恐怖是一个很好的制止标准. (但是众所周知,停止标准很难获得正确")

I'm also not sure that derror is a good stopping criteria. (But stopping criteria are notoriously hard to get "right")

我的最后一点是,对于参数拟合,梯度下降非常慢.您可能想改用共轭梯度或Levenberg-Marquadt方法.我怀疑numpy或scipy包中的python这两种方法已经存在(默认情况下不是python的一部分,但很容易安装)

My final point is that gradient descent is horribly slow for parameter fitting. You probably want to use conjugate-gradient or Levenberg-Marquadt methods instead. I suspect that both of these methods already exist for python in the numpy or scipy packages (which aren't part of python by default but are pretty easy to install)

这篇关于如何创建一个简单的梯度下降算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆