使用 python 和 numpy 的梯度下降 [英] gradient descent using python and numpy

查看:32
本文介绍了使用 python 和 numpy 的梯度下降的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

def gradient(X_norm,y,theta,alpha,m,n,num_it):
    temp=np.array(np.zeros_like(theta,float))
    for i in range(0,num_it):
        h=np.dot(X_norm,theta)
        #temp[j]=theta[j]-(alpha/m)*(  np.sum( (h-y)*X_norm[:,j][np.newaxis,:] )  )
        temp[0]=theta[0]-(alpha/m)*(np.sum(h-y))
        temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1]))
        theta=temp
    return theta



X_norm,mean,std=featureScale(X)
#length of X (number of rows)
m=len(X)
X_norm=np.array([np.ones(m),X_norm])
n,m=np.shape(X_norm)
num_it=1500
alpha=0.01
theta=np.zeros(n,float)[:,np.newaxis]
X_norm=X_norm.transpose()
theta=gradient(X_norm,y,theta,alpha,m,n,num_it)
print theta

我上面代码的 theta 是 100.2 100.2,但在 matlab 中应该是 100.2 61.09 这是正确的.

My theta from the above code is 100.2 100.2, but it should be 100.2 61.09 in matlab which is correct.

推荐答案

我觉得你的代码有点太复杂了,需要更多的结构,否则你会迷失在所有的方程式和运算中.最后这个回归归结为四个操作:

I think your code is a bit too complicated and it needs more structure, because otherwise you'll be lost in all equations and operations. In the end this regression boils down to four operations:

  1. 计算假设 h = X * theta
  2. 计算损失 = h - y 以及成本的平方 (loss^2)/2m
  3. 计算梯度 = X' * loss/m
  4. 更新参数 theta = theta - alpha * gradient

就您而言,我猜您将 mn 混淆了.这里 m 表示训练集中的示例数量,而不是特征数量.

In your case, I guess you have confused m with n. Here m denotes the number of examples in your training set, not the number of features.

让我们看看我的代码变体:

Let's have a look at my variation of your code:

import numpy as np
import random

# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y

# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)

首先我创建了一个小的随机数据集,它应该是这样的:

At first I create a small random dataset which should look like this:

如您所见,我还添加了生成的回归线和由 excel 计算的公式.

As you can see I also added the generated regression line and formula that was calculated by excel.

你需要注意使用梯度下降回归的直觉.当您对数据 X 进行完整的批量传递时,您需要将每个示例的 m 损失减少到单个权重更新.在这种情况下,这是梯度总和的平均值,即除以 m.

You need to take care about the intuition of the regression using gradient descent. As you do a complete batch pass over your data X, you need to reduce the m-losses of every example to a single weight update. In this case, this is the average of the sum over the gradients, thus the division by m.

接下来需要注意的是跟踪收敛并调整学习率.就此而言,您应该始终跟踪每次迭代的成本,甚至可以绘制它.

The next thing you need to take care about is to track the convergence and adjust the learning rate. For that matter you should always track your cost every iteration, maybe even plot it.

如果您运行我的示例,返回的 theta 将如下所示:

If you run my example, the theta returned will look like this:

Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368   1.01108458]

这实际上非常接近由 excel 计算的方程 (y = x + 30).请注意,当我们将偏差传递到第一列时,第一个 theta 值表示偏差权重.

Which is actually quite close to the equation that was calculated by excel (y = x + 30). Note that as we passed the bias into the first column, the first theta value denotes the bias weight.

这篇关于使用 python 和 numpy 的梯度下降的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆