从内部dlply调用的lm抛出"0(非NA)个".错误[r] [英] lm called from inside dlply throws "0 (non-NA) cases" error [r]

查看:750
本文介绍了从内部dlply调用的lm抛出"0(非NA)个".错误[r]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将dlply()与自定义函数结合使用,该函数对lm()的斜率取平均值,以适合包含某些NA值的数据,但出现错误 "lm.fit(x,y,offset = offset,singular.ok = singular.ok,...)中的错误: 0(非NA)案例"

I'm using dlply() with a custom function that averages slopes of lm() fits on data that contain some NA values, and I get the error "Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases"

仅当我使用两个关键变量调用dlply时,才会发生此错误-用一个变量分隔可以很好地工作.

This error only happens when I call dlply with two key variables - separating by one variable works fine.

令人讨厌的是,我无法使用简单的数据集重现该错误,因此我已将问题数据集发布到我的保管箱中.

Annoyingly I can't reproduce the error with a simple dataset, so I've posted the problem dataset in my dropbox.

以下是代码,请尽可能减少代码,同时仍会产生错误:

Here's the code, as minimized as possible while still producing an error:

masterData <- read.csv("http://dl.dropbox.com/u/48901983/SOquestionData.csv", na.strings="#N/A")

workingData <- data.frame(sample = masterData$sample,
                      substrate = masterData$substrate,
                      el1 = masterData$elapsedHr1,
                      F1 = masterData$r1 - masterData$rK)

#This function is trivial as written; in reality it takes the average of many slopes
meanSlope <- function(df) {
     lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help
     slope1 <- lm1$coefficients[2]
     meanSlope <- mean(c(slope1)) 
}

lsGOOD <- dlply(workingData, .(sample), meanSlope) #works fine

lsBAD <- dlply(workingData, .(sample, substrate), meanSlope) #throws error

在此先感谢您的见解.

推荐答案

对于您的多个交叉分类,您缺少协变量:

For several of your cross-classifications you have missing covariates:

 with(masterData, table(sample, substrate, r1mis = is.na(r1) ) )
#
snipped the nonmissing reports
, , r1mis = TRUE

      substrate
sample 1 2 3 4 5 6 7 8
    3  0 0 0 0 0 0 0 0
    4  0 0 0 0 0 0 0 0
    5  0 0 0 0 0 0 0 0
    6  0 0 0 0 0 0 0 0
    7  0 0 0 0 0 0 3 3
    8  0 0 0 0 0 0 0 3
    9  0 0 0 0 0 0 0 3
    10 0 0 0 0 0 0 0 3
    11 0 0 0 0 0 0 0 3
    12 0 0 0 0 0 0 0 3
    13 0 0 0 0 0 0 0 3
    14 0 0 0 0 0 0 0 3

这将使您跳过此特定数据中数据不足的子集:

This would let you skip over the subsets with insufficient data in this particular data:

meanSlope <- function(df) { if ( sum(!is.na(df$el1)) < 2 ) { return(NA) } else {
     lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help
     slope1 <- lm1$coefficients[2]
     meanSlope <- mean(c(slope1)) }
}

尽管这取决于缺失在一个特定的协变量中.一个更可靠的解决方案是使用try捕获错误并转换为NA.

Although it depends on the missingness being in one particular covariate. A more robust solution would be to use try to capture errors and convert to NA's.

?try

这篇关于从内部dlply调用的lm抛出"0(非NA)个".错误[r]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆