八度中的梯度下降实现 [英] Gradient Descent implementation in octave

查看:86
本文介绍了八度中的梯度下降实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我实际上已经为此奋斗了大约2个月了.是什么使这些与众不同?

I've actually been struggling against this for like 2 months now. What is it that makes these different?

hypotheses= X * theta
temp=(hypotheses-y)'
temp=X(:,1) * temp
temp=temp * (1 / m)
temp=temp * alpha
theta(1)=theta(1)-temp

hypotheses= X * theta
temp=(hypotheses-y)'
temp=temp * (1 / m)
temp=temp * alpha
theta(2)=theta(2)-temp



theta(1) = theta(1) - alpha * (1/m) * ((X * theta) - y)' * X(:, 1);
theta(2) = theta(2) - alpha * (1/m) * ((X * theta) - y)' * X(:, 2);

后者有效.我只是不确定为什么..我努力理解矩阵逆的必要性.

The latter works. I'm just not sure why..I struggle to understand the need for the matrix inverse .

推荐答案

您在第二个块的第一个示例中正在做什么,您错过了步骤了吗?我假设您将X与1的向量连接在一起.

What you're doing in the first example in the second block you've missed out a step haven't you? I am assuming you concatenated X with a vector of ones.

   temp=X(:,2) * temp

最后一个示例可以工作,但可以向量化得更多,从而更加简单和有效.

The last example will work but can be vectorized even more to be more simple and efficient.

我假设您只有1个功能.它将与多个功能相同,因为发生的一切是您在X矩阵中为每个功能添加了一个额外的列.基本上,您会在x上加上一个向量,以向量化截距.

I've assumed you only have 1 feature. it will work the same with multiple features since all that happens is you add an extra column to your X matrix for each feature. Basically you add a vector of ones to x to vectorize the intercept.

您可以在一行代码中更新thetas的2x1矩阵.用x串联一个矢量,使其成为nx2矩阵,然后可以乘以theta矢量(2x1),即(X * theta)位,从而计算h(x).

You can update a 2x1 matrix of thetas in one line of code. With x concatenate a vector of ones making it a nx2 matrix then you can calculate h(x) by multiplying by the theta vector (2x1), this is (X * theta) bit.

向量化的第二部分是对(X * theta)-y)进行转置,这将为您提供1 * n矩阵,将其乘以X(n * 2矩阵)后,基本上将两者合并在一起(h(x)- y)x0和(h(x)-y)x1.根据定义,两个theta是同时完成的.这将导致我的新theta的一个1 * 2矩阵,我再次对其进行转置以围绕该矢量翻转,使其尺寸与theta矢量相同.然后,我可以通过alpha和矢量减法用theta进行简单的标量乘法.

The second part of the vectorization is to transpose (X * theta) - y) which gives you a 1*n matrix which when multiplied by X (an n*2 matrix) will basically aggregate both (h(x)-y)x0 and (h(x)-y)x1. By definition both thetas are done at the same time. This results in a 1*2 matrix of my new theta's which I just transpose again to flip around the vector to be the same dimensions as the theta vector. I can then do a simple scalar multiplication by alpha and vector subtraction with theta.

X = data(:, 1); y = data(:, 2);
m = length(y);
X = [ones(m, 1), data(:,1)]; 
theta = zeros(2, 1);        

iterations = 2000;
alpha = 0.001;

for iter = 1:iterations
     theta = theta -((1/m) * ((X * theta) - y)' * X)' * alpha;
end

这篇关于八度中的梯度下降实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆