从内部dlply调用的lm抛出"0(非NA)个".错误[r] [英] lm called from inside dlply throws "0 (non-NA) cases" error [r]
问题描述
我正在将dlply()与自定义函数结合使用,该函数对lm()的斜率取平均值,以适合包含某些NA值的数据,但出现错误 "lm.fit(x,y,offset = offset,singular.ok = singular.ok,...)中的错误: 0(非NA)案例"
I'm using dlply() with a custom function that averages slopes of lm() fits on data that contain some NA values, and I get the error "Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases"
仅当我使用两个关键变量调用dlply时,才会发生此错误-用一个变量分隔可以很好地工作.
This error only happens when I call dlply with two key variables - separating by one variable works fine.
令人讨厌的是,我无法使用简单的数据集重现该错误,因此我已将问题数据集发布到我的保管箱中.
Annoyingly I can't reproduce the error with a simple dataset, so I've posted the problem dataset in my dropbox.
以下是代码,请尽可能减少代码,同时仍会产生错误:
Here's the code, as minimized as possible while still producing an error:
masterData <- read.csv("http://dl.dropbox.com/u/48901983/SOquestionData.csv", na.strings="#N/A")
workingData <- data.frame(sample = masterData$sample,
substrate = masterData$substrate,
el1 = masterData$elapsedHr1,
F1 = masterData$r1 - masterData$rK)
#This function is trivial as written; in reality it takes the average of many slopes
meanSlope <- function(df) {
lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help
slope1 <- lm1$coefficients[2]
meanSlope <- mean(c(slope1))
}
lsGOOD <- dlply(workingData, .(sample), meanSlope) #works fine
lsBAD <- dlply(workingData, .(sample, substrate), meanSlope) #throws error
在此先感谢您的见解.
推荐答案
对于您的多个交叉分类,您缺少协变量:
For several of your cross-classifications you have missing covariates:
with(masterData, table(sample, substrate, r1mis = is.na(r1) ) )
#
snipped the nonmissing reports
, , r1mis = TRUE
substrate
sample 1 2 3 4 5 6 7 8
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 3 3
8 0 0 0 0 0 0 0 3
9 0 0 0 0 0 0 0 3
10 0 0 0 0 0 0 0 3
11 0 0 0 0 0 0 0 3
12 0 0 0 0 0 0 0 3
13 0 0 0 0 0 0 0 3
14 0 0 0 0 0 0 0 3
这将使您跳过此特定数据中数据不足的子集:
This would let you skip over the subsets with insufficient data in this particular data:
meanSlope <- function(df) { if ( sum(!is.na(df$el1)) < 2 ) { return(NA) } else {
lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help
slope1 <- lm1$coefficients[2]
meanSlope <- mean(c(slope1)) }
}
尽管这取决于缺失在一个特定的协变量中.一个更可靠的解决方案是使用try
捕获错误并转换为NA.
Although it depends on the missingness being in one particular covariate. A more robust solution would be to use try
to capture errors and convert to NA's.
?try
这篇关于从内部dlply调用的lm抛出"0(非NA)个".错误[r]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!