R中的线性回归梯度下降算法产生不同的结果 [英] Linear regression gradient descent algorithms in R produce varying results

查看:85
本文介绍了R中的线性回归梯度下降算法产生不同的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下数据从零开始在R中实现线性回归,而不使用任何包或库:

I am trying to implement a linear regression in R from scratch without using any packages or libraries using the following data:


UCI Machine学习资料库,共享自行车数据集

UCI Machine Learning Repository, Bike-Sharing-Dataset

线性回归很容易,这里是代码:

The linear regression was easy enough, here is the code:

data <- read.csv("Bike-Sharing-Dataset/hour.csv")

# Select the useable features
data1 <- data[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed", "cnt")]

# Split the data
trainingObs<-sample(nrow(data1),0.70*nrow(data1),replace=FALSE)

# Create the training dataset
trainingDS<-data1[trainingObs,]

# Create the test dataset
testDS<-data1[-trainingObs,]

x0 <- rep(1, nrow(trainingDS)) # column of 1's
x1 <- trainingDS[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed")]

# create the x- matrix of explanatory variables
x <- as.matrix(cbind(x0,x1))

# create the y-matrix of dependent variables

y <- as.matrix(trainingDS$cnt)
m <- nrow(y)

solve(t(x)%*%x)%*%t(x)%*%y 

下一步是实现批处理更新梯度下降,这是我遇到问题的地方。我不知道错误从何而来或如何解决,但是代码有效。问题在于所产生的值与回归结果完全不同,我不确定为什么。

The next step is to implement the batch update gradient descent and here is where I am running into problems. I dont know where the errors are coming from or how to fix them, but the code works. The problem is that the values being produced are radically different from the results of the regression and I am unsure of why.

我已经实现了两个版本的批量更新梯度下降(两种算法的结果彼此不同,并且与回归结果也不同):

The two versions of the batch update gradient descent that I have implemented are as follows (the results of both algorithms differ from one another and from the results of the regression):

# Gradient descent 1
gradientDesc <- function(x, y, learn_rate, conv_threshold, n, max_iter) {
  plot(x, y, col = "blue", pch = 20)
  m <- runif(1, 0, 1)
  c <- runif(1, 0, 1)
  yhat <- m * x + c
  MSE <- sum((y - yhat) ^ 2) / n
  converged = F
  iterations = 0
  while(converged == F) {
    ## Implement the gradient descent algorithm
    m_new <- m - learn_rate * ((1 / n) * (sum((yhat - y) * x)))
    c_new <- c - learn_rate * ((1 / n) * (sum(yhat - y)))
    m <- m_new
    c <- c_new
    yhat <- m * x + c
    MSE_new <- sum((y - yhat) ^ 2) / n
    if(MSE - MSE_new <= conv_threshold) {
      abline(c, m) 
      converged = T
      return(paste("Optimal intercept:", c, "Optimal slope:", m))
    }
    iterations = iterations + 1
    if(iterations > max_iter) { 
      abline(c, m) 
      converged = T
      return(paste("Optimal intercept:", c, "Optimal slope:", m))
    }
  }
  return(paste("MSE=", MSE))
}

AND:

grad <- function(x, y, theta) { # note that for readability, I redefined theta as a column vector
  gradient <-  1/m* t(x) %*% (x %*% theta - y) 
  return(gradient)
}
grad.descent <- function(x, maxit, alpha){
  theta <- matrix(rep(0, length=ncol(x)), ncol = 1)
  for (i in 1:maxit) {
    theta <- theta - alpha  * grad(x, y, theta)   
  }
  return(theta)
}

如果有人可以解释为什么这两个函数产生不同的结果,我将不胜感激。我还想确保我实际上正确地实现了梯度下降。

If someone could explain why these two functions are producing different results I would greatly appreciate it. I also want to make sure that I am in fact implementing the gradient descent correctly.

最后,我如何绘制具有不同学习率的下降结果并将这些数据叠加在回归本身的结果上?

Lastly, how can I plot the results of the descent with varying learning rates and superimpose this data over the results of the regression itself?

编辑
这是使用alpha = .005和10,000次迭代运行这两种算法的结果:

EDIT Here are the results of running the two algorithms with alpha = .005 and 10,000 iterations:

1)

> gradientDesc(trainingDS, y, 0.005, 0.001, 32, 10000)
TEXT_SHOW_BACKTRACE environmental variable.
[1] "Optimal intercept: 2183458.95872599 Optimal slope: 62417773.0184353"

2)

> print(grad.descent(x, 10000, .005))
                   [,1]
x0            8.3681113
season       19.8399837
mnth         -0.3515479
hr            8.0269388
holiday     -16.2429750
weekday       1.9615369
workingday    7.6063719
weathersit  -12.0611266
temp        157.5315413
atemp       138.8019732
hum        -162.7948299
windspeed    31.5442471


推荐答案

为您提供一个示例,说明如何在一种更好的方法,请考虑以下内容:

To give you an example of how to write functions like this in a slightly better way, consider the following:

gradientDesc <- function(x, y, learn_rate, conv_threshold, max_iter) {
  n <- nrow(x) 
  m <- runif(ncol(x), 0, 1) # m is a vector of dimension ncol(x), 1
  yhat <- x %*% m # since x already contains a constant, no need to add another one

  MSE <- sum((y - yhat) ^ 2) / n

  converged = F
  iterations = 0

  while(converged == F) {
    m <- m - learn_rate * ( 1/n * t(x) %*% (yhat - y))
    yhat <- x %*% m
    MSE_new <- sum((y - yhat) ^ 2) / n

    if( abs(MSE - MSE_new) <= conv_threshold) {
      converged = T
    }
    iterations = iterations + 1
    MSE <- MSE_new

    if(iterations >= max_iter) break
  }
  return(list(converged = converged, 
              num_iterations = iterations, 
              MSE = MSE_new, 
              coefs = m) )
}

作比较:

ols <- solve(t(x)%*%x)%*%t(x)%*%y 

现在,

out <- gradientDesc(x,y, 0.005, 1e-7, 200000)

data.frame(ols, out$coefs)
                    ols    out.coefs
x0           33.0663095   35.2995589
season       18.5603565   18.5779534
mnth         -0.1441603   -0.1458521
hr            7.4374031    7.4420685
holiday     -21.0608520  -21.3284449
weekday       1.5115838    1.4813259
workingday    5.9953383    5.9643950
weathersit   -0.2990723   -0.4073493
temp        100.0719903  147.1157262
atemp       226.9828394  174.0260534
hum        -225.7411524 -225.2686640
windspeed    12.3671942    9.5792498

此处, x 指的是您在第一个代码块中定义的 x 。注意系数之间的相似性。但是,也请注意

Here, x refers to your x as defined in your first code chunk. Note the similarity between the coefficients. However, also note that

out$converged
[1] FALSE

,这样您就可以通过增加迭代次数或计算步长来提高准确性。它也可能有助于先缩放变量。

so that you could increase the accuracy by increasing the number of iterations or by playing around with the step size. It might also help to scale your variables first.

这篇关于R中的线性回归梯度下降算法产生不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆