线性模型的因式分解-使用一个因数创建lm [英] Factoring for linear models - Create lm with one factor
问题描述
此问题是对于单个lm
或speedlm
计算,我正在使用的数据集太大.
我想将数据集分成较小的部分,但是这样做时,一个(或多个)列仅包含一个
The dataset I'm using is too large for a single lm
or speedlm
calculation.
I want to split up my data set in smaller pieces but in doing this, one(or more) of the columns only contains one factor.
The code below is the mininum to reproduce my example. On the bottom of the question I will put my testing script for those interested.
library(speedglm)
iris$Species <- factor(iris$Species)
i <- iris[1:20,]
summary(i)
speedlm(Sepal.Length ~ Sepal.Width + Species , i)
这给我以下错误:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
我试图分解iris$Species
,但是没有成功.我真的不知道如何立即解决此问题.
I have tried to factorize iris$Species
but without success. I really don't have a clue how I could fix this now.
如何在模型中包含Species
? (不增加样本数量)
How can I include Species
into the model? (without increasing the sample size)
我知道我只有一个级别:"setosa",但是我仍然需要将它包含在线性模型中,因为最终我将使用更多因素来更新模型,如下面的示例脚本所示
I know I only have one level: "setosa" but I still need it included in the linear model because I will update the model with more factors eventually, as seen in the example script below
对于那些感兴趣的人,这是我将用于实际数据集的示例脚本:
For those interested, here is an example script of what I will use for my actual dataset:
library(speedglm)
testfunction <- function(start.i, end.i) {
return(iris[start.i:end.i,])
}
lengthdata <- nrow(iris)
stepsize <- 20
## attempt to factor
iris$Species <- factor(iris$Species)
## Creates the iris dataset in split parts
start.i <- seq(0, lengthdata, stepsize)
end.i <- pmin(start.i + stepsize, lengthdata)
dat <- Map(testfunction, start.i + 1, end.i)
## Loops trough the split iris data
for (i in dat) {
if (!exists("lmfit")) {
lmfit <- speedlm(Sepal.Length ~ Sepal.Width + Species , i)
} else if (!exists("lmfit2")) {
lmfit2 <- updateWithMoreData(lmfit, i)
} else {
lmfit2 <- updateWithMoreData(lmfit2, i)
}
}
print(summary(lmfit2))
推荐答案
可能有更好的方法,但是如果您对行进行重新排序,则每个拆分将包含更多级别,因此不会引起错误.我创建了一个随机订单,但您可能想做一个更系统的方法.
There might be a better way, but if you reorder your rows, each split will contain more levels, and therefore not cause the error. I created a random order, but you might want to do a more systematic way.
library(speedglm)
testfunction <- function(start.i, end.i) {
return(iris.r[start.i:end.i,])
}
lengthdata <- nrow(iris)
stepsize <- 20
## attempt to factor
iris$Species <- factor(iris$Species)
##Random order
set.seed(1)
iris.r <- iris[sample(nrow(iris)),]
## Creates the iris dataset in split parts
start.i <- seq(0, lengthdata, stepsize)
end.i <- pmin(start.i + stepsize, lengthdata)
dat <- Map(testfunction, start.i + 1, end.i)
## Loops trough the split iris data
for (i in dat) {
if (!exists("lmfit")) {
lmfit <- speedlm(Sepal.Length ~ Sepal.Width + Species , i)
} else if (!exists("lmfit2")) {
lmfit2 <- updateWithMoreData(lmfit, i)
} else {
lmfit2 <- updateWithMoreData(lmfit2, i)
}
}
print(summary(lmfit2))
修改 您可以使用模除法,而不是随机顺序,以系统的方式生成快速输出的索引向量:
Edit Instead of the random order, you can use modulo division to generate a spred out index vector in a systematic way:
spred.i <- seq(1, by = 7, length.out = 150) %% 150 + 1
iris.r <- iris[spred.i,]
这篇关于线性模型的因式分解-使用一个因数创建lm的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!