NA列之间的相关性 [英] Correlation between NA columns

查看:166
本文介绍了NA列之间的相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须编写一个函数来获取数据文件的目录和完整情况的阈值,并计算每个文件中硫酸盐和硝酸盐(两列)之间的相关性,其中完全观察到的情况(所有变量)的数量是大于门槛。该函数应该返回满足阈值要求的监视器的相关向量。如果没有文件满足阈值要求,那么函数应该返回一个长度为0的数字向量。这个函数的原型如下:

我的代码看起来像这样

  corr < -  function(directory,threshold = 0){
a <-list.files(specdata)$ b (b)($ a
data < - read.csv(paste(directory,/,i,sep =))
x <-complete.cases(data)
j <-sum(as.numeric(x))
sulfate <-data [,2]
硝酸盐<-data [,3]
b <-cor(硫酸盐(b)
其他
数字()
}
$ b如果我输入





没有错误信息



z <-corr(specdata)


head(z)
[1 ]不适用


我不知道问题出在哪里。我不知道列中的NA值是否与它有关。我认为我的代码中缺少一些东西。我认为read.csv创建一个唯一的数据框,当我需要一个数据框每个文件,但我不明白为什么在这种情况下返回是NA(当没有门槛)。

然而,如果我引入一个更大的阈值(1000):

$ p $ z <-corr(specdata, 1000)
头(z)
数字(0)

预期输出我需要的是

$ $ $ $ $ b $ $ $ $ $ $ b [1] -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 -0.07588814


解决方案你可以参考这个

corr< - function(directory,threshold = 0){
##'directory'是一个长度为1的字符向量,表示
##的位置。CSV文件

##'threshold'是长度为1的数字向量,表示
##完全观察到的观测(对所有变量)计算
##所需的硝酸盐和硫酸盐之间的关系;默认为0

##返回相关数字向量
df =完整(目录)
ids = df [df [nobs]> (i,ids){

newRead = read.csv(paste(directory,/,formatC(i))$ id
corrr = numeric()
,宽度= 3,标志=0),
.csv,sep =))
dff = newRead [complete.cases(newRead),]
corrr = c (corrr,cor(dff $ sulfate,dff $ nitrate))
}
return(corrr)
}
complete < - function(directory,id = 1:332)粘贴(目录,/,formatC(i,width = 3,flag =0),
.csv,sep =))
sum(complete.cases(data))
}
nobs = sapply(id,f)
return(data.frame (id,nobs))
}
cr < - corr(specdata,150)
head(cr)


I have to write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate (two columns) from each file where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no files meet the threshold requirement, then the function should return a numeric vector of length 0. A prototype of this function follows

My code looks like this

corr <- function(directory,threshold=0){
    a<-list.files("specdata")
    for (i in a) {
        data <- read.csv(paste(directory, "/", i, sep =""))
        x<-complete.cases(data)
        j<-sum(as.numeric(x))
        sulfate<-data[,2]
        nitrate<-data[,3]
        b<-cor(sulfate,nitrate)
    }  
    if (j>threshold) 
        return(b) 
    else
        numeric()
}

there's no error messege

If I type

z<-corr("specdata")

head(z) [1] NA

I don't know what the problem is. I don't know if NA values in the columns have to do with it. I think something is missing in my code. I think the read.csv creates a unique data frame when I need one data frame per file but I don't see why the return is NA in this case (when there's no threshold).

However, if I introduce a bigger threshold (1000):

z<-corr("specdata",1000)
head(z)
numeric(0)

The expected output I need is

cr <- corr("specdata", 150) 
head(cr) 
[1] -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 -0.07588814

解决方案

this is the correct and running solution you can refer to this 

corr <- function(directory, threshold = 0) {
  ## 'directory' is a character vector of length 1 indicating the location of
  ## the CSV files

  ## 'threshold' is a numeric vector of length 1 indicating the number of
  ## completely observed observations (on all variables) required to compute
  ## the correlation between nitrate and sulfate; the default is 0

  ## Return a numeric vector of correlations
  df = complete(directory)
  ids = df[df["nobs"] > threshold, ]$id
  corrr = numeric()
  for (i in ids) {

    newRead = read.csv(paste(directory, "/", formatC(i, width = 3, flag = "0"), 
                             ".csv", sep = ""))
    dff = newRead[complete.cases(newRead), ]
    corrr = c(corrr, cor(dff$sulfate, dff$nitrate))
  }
  return(corrr)
}
complete <- function(directory, id = 1:332) {
  f <- function(i) {
    data = read.csv(paste(directory, "/", formatC(i, width = 3, flag = "0"), 
                          ".csv", sep = ""))
    sum(complete.cases(data))
  }
  nobs = sapply(id, f)
  return(data.frame(id, nobs))
}
cr <- corr("specdata", 150)
head(cr)

这篇关于NA列之间的相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆