线性模型的因式分解-使用一个因数创建lm [英] Factoring for linear models - Create lm with one factor

查看:156
本文介绍了线性模型的因式分解-使用一个因数创建lm的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题是对于单个lmspeedlm计算,我正在使用的数据集太大.
我想将数据集分成较小的部分,但是这样做时,一个(或多个)列仅包含一个

The dataset I'm using is too large for a single lm or speedlm calculation.
I want to split up my data set in smaller pieces but in doing this, one(or more) of the columns only contains one factor.
The code below is the mininum to reproduce my example. On the bottom of the question I will put my testing script for those interested.

library(speedglm)

iris$Species <- factor(iris$Species)
i <- iris[1:20,]
summary(i)
speedlm(Sepal.Length ~ Sepal.Width + Species , i)

这给我以下错误:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

我试图分解iris$Species,但是没有成功.我真的不知道如何立即解决此问题.

I have tried to factorize iris$Species but without success. I really don't have a clue how I could fix this now.

如何在模型中包含Species? (不增加样本数量)

How can I include Species into the model? (without increasing the sample size)


我知道我只有一个级别:"setosa",但是我仍然需要将它包含在线性模型中,因为最终我将使用更多因素来更新模型,如下面的示例脚本所示


I know I only have one level: "setosa" but I still need it included in the linear model because I will update the model with more factors eventually, as seen in the example script below

对于那些感兴趣的人,这是我将用于实际数据集的示例脚本:

For those interested, here is an example script of what I will use for my actual dataset:

library(speedglm)

testfunction <- function(start.i, end.i) {
  return(iris[start.i:end.i,])
}

  lengthdata <- nrow(iris)
  stepsize <- 20

## attempt to factor
  iris$Species <- factor(iris$Species)

## Creates the iris dataset in split parts
  start.i <- seq(0, lengthdata, stepsize)
  end.i   <- pmin(start.i + stepsize, lengthdata)

  dat <- Map(testfunction, start.i + 1, end.i)

## Loops trough the split iris data
  for (i in dat) {
    if (!exists("lmfit")) {
      lmfit  <- speedlm(Sepal.Length ~ Sepal.Width + Species , i)
    } else if (!exists("lmfit2")) {
      lmfit2 <- updateWithMoreData(lmfit, i)
    } else {
      lmfit2 <- updateWithMoreData(lmfit2, i)
    }
  }
  print(summary(lmfit2))

推荐答案

可能有更好的方法,但是如果您对行进行重新排序,则每个拆分将包含更多级别,因此不会引起错误.我创建了一个随机订单,但您可能想做一个更系统的方法.

There might be a better way, but if you reorder your rows, each split will contain more levels, and therefore not cause the error. I created a random order, but you might want to do a more systematic way.

library(speedglm)

testfunction <- function(start.i, end.i) {
    return(iris.r[start.i:end.i,])
}

lengthdata <- nrow(iris)
stepsize <- 20

## attempt to factor
iris$Species <- factor(iris$Species)

##Random order
set.seed(1)
iris.r <- iris[sample(nrow(iris)),]

## Creates the iris dataset in split parts
start.i <- seq(0, lengthdata, stepsize)
end.i   <- pmin(start.i + stepsize, lengthdata)

dat <- Map(testfunction, start.i + 1, end.i)

## Loops trough the split iris data
for (i in dat) {
    if (!exists("lmfit")) {
        lmfit  <- speedlm(Sepal.Length ~ Sepal.Width + Species , i)
    } else if (!exists("lmfit2")) {
        lmfit2 <- updateWithMoreData(lmfit, i)
    } else {
        lmfit2 <- updateWithMoreData(lmfit2, i)
    }
}
print(summary(lmfit2))

修改 您可以使用模除法,而不是随机顺序,以系统的方式生成快速输出的索引向量:

Edit Instead of the random order, you can use modulo division to generate a spred out index vector in a systematic way:

spred.i <- seq(1, by = 7, length.out = 150) %% 150 + 1
iris.r <- iris[spred.i,]

这篇关于线性模型的因式分解-使用一个因数创建lm的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆