Python梯度下降多重回归-成本增加到无穷大 [英] Python gradient-descent multi-regression - cost increases to infinity
问题描述
为我的最后一年的项目编写此算法.使用梯度下降法找到最小值,但代价却高达无穷大.
Writing this algorithm for my final year project. Used gradient descent to find the minimum, but instead getting the cost as high as infinity.
我已经检查了 gradientDescent 函数.我相信那是正确的.
I have checked the gradientDescent function. I believe that's correct.
我正在导入的csv及其格式导致某些错误. CSV中的数据具有以下格式.
The csv I am importing and its formatting is causing some error. The data in the CSV is of below format.
"|"之前的每个四边形是一行.
Each quad before '|' is a row.
前3列是自变量x. 第四列与y相关.
First 3 columns are independent variables x. 4th column is dependent y.
600 20 0.5 0.63 | 600 20 1 1.5 | 800 20 0.5 0.9
600 20 0.5 0.63 | 600 20 1 1.5 | 800 20 0.5 0.9
import numpy as np
import random
import pandas as pd
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=",",header=None)
x = df.loc[:,'0':'2'].as_matrix()
y = df[3].as_matrix()
print(x)
print(y)
m, n = np.shape(x)
numIterations= 100
alpha = 0.001
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
推荐答案
正如注释中提到的前瞻性说明,问题出在读取csv所在的行中.您正在设置delimiter=","
,这意味着python希望数据中的每一列都用逗号分隔.但是,在您的数据中,列显然由空格分隔.
As forayer mentioned in the comments, the problem is in the line where you read the csv. You are setting delimiter=","
, which means that python expects each column in your data to be separated by a comma. However, in your data, columns are apparently separated by a whitespace.
只需将行替换为
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=" ",header=None)
这篇关于Python梯度下降多重回归-成本增加到无穷大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!