R中的线性回归而不将数据复制到内存中? [英] linear regression in R without copying data in memory?

查看：60 发布时间：2020/4/30 12:24:02 r memory-management linear-regression lm

本文介绍了R中的线性回归而不将数据复制到内存中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

进行线性回归的标准方法如下:

The standard way of doing a linear regression is something like this:

l <- lm(Sepal.Width ~ Petal.Length + Petal.Width, data=iris)

，然后使用predict(l, new_data)进行预测，其中new_data是具有与公式匹配的列的数据框.但是lm()返回一个lm对象，该对象列表包含一些垃圾内容，这些东西在大多数情况下都是无关紧要的.这包括原始数据的副本，以及一堆命名的向量和数组，这些向量和数组的长度/大小为数据:

and then use predict(l, new_data) to make predictions, where new_data is a dataframe with columns matching the formula. But lm() returns an lm object, which is a list that contains crap-loads of stuff that is mostly irrelevant in most situations. This includes a copy of the original data, and a bunch of named vectors and arrays the length/size of the data:

R> str(l)
List of 12
 $ coefficients : Named num [1:3] 3.587 -0.257 0.364
  ..- attr(*, "names")= chr [1:3] "(Intercept)" "Petal.Length" "Petal.Width"
 $ residuals    : Named num [1:150] 0.2 -0.3 -0.126 -0.174 0.3 ...
  ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
 $ effects      : Named num [1:150] -37.445 -2.279 -0.914 -0.164 0.313 ...
  ..- attr(*, "names")= chr [1:150] "(Intercept)" "Petal.Length" "Petal.Width" "" ...
 $ rank         : int 3
 $ fitted.values: Named num [1:150] 3.3 3.3 3.33 3.27 3.3 ...
  ..- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
 $ assign       : int [1:3] 0 1 2
 $ qr           :List of 5
  ..$ qr   : num [1:150, 1:3] -12.2474 0.0816 0.0816 0.0816 0.0816 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:150] "1" "2" "3" "4" ...
  .. .. ..$ : chr [1:3] "(Intercept)" "Petal.Length" "Petal.Width"
  .. ..- attr(*, "assign")= int [1:3] 0 1 2
  ..$ qraux: num [1:3] 1.08 1.1 1.01
  ..$ pivot: int [1:3] 1 2 3
  ..$ tol  : num 1e-07
  ..$ rank : int 3
  ..- attr(*, "class")= chr "qr"
 $ df.residual  : int 147
 $ xlevels      : Named list()
 $ call         : language lm(formula = Sepal.Width ~ Petal.Length + Petal.Width, data = iris)
 $ terms        :Classes 'terms', 'formula' length 3 Sepal.Width ~ Petal.Length + Petal.Width
  .. ..- attr(*, "variables")= language list(Sepal.Width, Petal.Length, Petal.Width)
  .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:3] "Sepal.Width" "Petal.Length" "Petal.Width"
  .. .. .. ..$ : chr [1:2] "Petal.Length" "Petal.Width"
  .. ..- attr(*, "term.labels")= chr [1:2] "Petal.Length" "Petal.Width"
  .. ..- attr(*, "order")= int [1:2] 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(Sepal.Width, Petal.Length, Petal.Width)
  .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:3] "Sepal.Width" "Petal.Length" "Petal.Width"
 $ model        :'data.frame':  150 obs. of  3 variables:
  ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
  ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
  ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
  ..- attr(*, "terms")=Classes 'terms', 'formula' length 3 Sepal.Width ~ Petal.Length + Petal.Width
  .. .. ..- attr(*, "variables")= language list(Sepal.Width, Petal.Length, Petal.Width)
  .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:3] "Sepal.Width" "Petal.Length" "Petal.Width"
  .. .. .. .. ..$ : chr [1:2] "Petal.Length" "Petal.Width"
  .. .. ..- attr(*, "term.labels")= chr [1:2] "Petal.Length" "Petal.Width"
  .. .. ..- attr(*, "order")= int [1:2] 1 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. ..- attr(*, "predvars")= language list(Sepal.Width, Petal.Length, Petal.Width)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:3] "Sepal.Width" "Petal.Length" "Petal.Width"
 - attr(*, "class")= chr "lm"

这些东西占用了大量空间，并且lm对象最终比原始数据集大了一个数量级:

That stuff takes up a lot of space, and the lm object ends up being almost an order of magnitude larger than the original dataset:

R> object.size(iris)
7088 bytes
R> object.size(l)
52704 bytes

这对于这么小的数据集来说不是问题，但是对于产生450mb lm对象的170Mb数据集来说，这确实是个问题.即使将所有返回选项设置为false，lm对象仍然是原始数据集的5倍:

This isn't a problem with a dataset as small as that, but it can be really problematic with a 170Mb dataset that produces a 450mb lm object. Even with all the return options set to false, the lm object is still 5 times the original dataset:

R> ls <- lm(Sepal.Width ~ Petal.Length + Petal.Width, data=iris, model=FALSE, x=FALSE, y=FALSE, qr=FALSE)
R> object.size(ls)
30568 bytes

有什么方法可以在R中拟合模型，然后能够预测新输入数据的输出值，无需存储大量多余的不必要数据?换句话说，有没有一种方法可以只是存储模型系数，但是仍然能够使用这些系数对新数据进行预测?

Is there any way of fitting a model in R, and then being able to predict output values on new input data, without storing crap tonnes of extra unnecessary data? In other words, is there a way to just store the model coefficients, but still be able to use those coefficients to predict on new data?

我想，除了不存储所有多余的数据外，我还对使用lm的一种方式非常感兴趣，这样它甚至不计算该数据-这只是在浪费CPU时间...

I guess, as well as not storing all that excess data, I'm also really interested in a way of using lm so that it doesn't even calculate that data - it's just wasted CPU time...

R中的线性回归而不将数据复制到内存中? [英] linear regression in R without copying data in memory?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R中的线性回归而不将数据复制到内存中? [英] linear regression in R without copying data in memory?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭