如何计算具有R中NA值的data.frame中正在运行的cor.test()? [英] How to compute a running cor.test() in a data.frame with NA values in R?

查看:364
本文介绍了如何计算具有R中NA值的data.frame中正在运行的cor.test()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在我的每日气候数据之间进行 running()关联,问题是我的data.frame中有许多缺失值(NA).我正在使用 cor.test(),因为我需要获取 p.values .例如,在某些日子里,我没有降水或湿度值,我想知道如何用我的温度数据计算这种运行相关性,而忽略了NA值.

I'm trying to do a running() correlation between my daily climate data, and the problem is that I have many missing values (NA) in my data.frame. I'm using the cor.test() because I need to get the p.values. For example in some days I don't have precipitation or humidity values, and I would like to know how to compute this running correlation with my temperature data, but omitting the NA values.

下面是一个具有NA值的示例:

Here an example with NA values:

library(gtools)
df <- data.frame(temp=rnorm(100, 10:30), prec=rnorm(100, 1:300), humi=rnorm(100, 1:100))

df$prec[c(1:10, 25:30, 95:100)] <-NA
df$humi[c(15:19, 20:25, 80:90)] <-NA

corPREC <- t(running(df$temp, df$prec, fun = cor.test, width=10, by=10))
corHUMI <- t(running(df$temp, df$humi, fun = cor.test, width=10, by=10))

推荐答案

您可以使用complete.cases获取完整行的逻辑矢量(TRUE = complete);然后在用于测试的ad-hoc函数内部子集化

You can use complete.cases to get a logical vector of complete rows (TRUE = complete); then subsetting inside ad-hoc function used for testing too

library(gtools)
df <- data.frame(temp=rnorm(100, 10:30), prec=rnorm(100, 1:300),
                 humi=rnorm(100, 1:100))

df$prec[c(1:10, 25:30, 95:100)] <-NA
df$humi[c(15:19, 20:25, 80:90)] <-NA

my.fun <- function(x,y) {
    my.df <- data.frame(x,y)
    my.df.cmpl <- my.df[complete.cases(my.df), ]

    # 3 complete obs is the minimum for cor.test
    if (nrow(my.df.cmpl)<=2) {
        return(rep(NA, 4))
    } else {
        my.test <- cor.test(my.df.cmpl$x,my.df.cmpl$y)
        return(c(my.test$statistic, my.test$p.value,
                 my.test$conf.int))
    }

}

corPREC <- t(running(df$temp, df$prec, fun = my.fun, width=10, by=10))
corHUMI <- t(running(df$temp, df$humi, fun = my.fun, width=10, by=10))

您也可以考虑

my.test <- cor.test(~ x + y, na.action = "na.exclude", data = my.df)

但是您不能(以直截了当的方式)处理很少的情况.

but you can't handle too-few-rows situations (in a straightforward manner).

这篇关于如何计算具有R中NA值的data.frame中正在运行的cor.test()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆