cor()函数的complete.obs [英] Complete.obs of cor() function

查看:1303
本文介绍了cor()函数的complete.obs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为我的数据建立一个相关矩阵,看起来像这样

I am establishing a correlation matrix for my data, which looks like this

df <- structure(list(V1 = c(56, 123, 546, 26, 62, 6, NA, NA, NA, 15
), V2 = c(21, 231, 5, 5, 32, NA, 1, 231, 5, 200), V3 = c(NA, 
NA, 24, 51, 53, 231, NA, 153, 6, 700), V4 = c(2, 10, NA, 20, 
56, 1, 1, 53, 40, 5000)), .Names = c("V1", "V2", "V3", "V4"), row.names = c(NA, 
10L), class = "data.frame")

这将提供以下数据框:

        V1  V2  V3   V4
    1   56  21  NA    2
    2  123 231  NA   10
    3  546   5  24   NA
    4   26   5  51   20
    5   62  32  53   56
    6    6  NA 231    1
    7   NA   1  NA    1
    8   NA 231 153   53
    9   NA   5   6   40
    10  15 200 700 5000

我通常使用complete.obs命令使用此命令建立我的相关矩阵

I normally use a complete.obs command to establish my correlation matrix using this command

crm <- cor(df, use="complete.obs", method="pearson") 

我的问题是,complete.obs如何处理数据?这样是否会忽略任何具有"NA"值的行,创建"NA"空闲表并立即创建相关矩阵?

My question here is, how does the complete.obs treat the data? does it omit any row having a "NA" value, make a "NA" free table and make a correlation matrix at once like this?

df2 <- structure(list(V1 = c(26, 62, 15), V2 = c(5, 32, 200), V3 = c(51, 
53, 700), V4 = c(20, 56, 5000)), .Names = c("V1", "V2", "V3", 
"V4"), row.names = c(NA, 3L), class = "data.frame")

还是以成对的方式省略"NA"值,例如在计算V1和V2之间的相关性时,在V3中包含NA值的行(例如,在我的示例中为第1行和第2行)是否得到也省略了吗?

or does it omit "NA" values in a pairwise fashion, for example when calculating correlation between V1 and V2, the row that contains an NA value in V3, (such as rows 1 and 2 in my example) do they get omitted too?

如果是这种情况,我很期待建立一个命令,通过成对地删除NA值来保留尽可能多的数据.

If this is the case, I am looking forward to establish a command that reserves as much as possible of the data, by omitting NA values in a pairwise fashion.

非常感谢,

推荐答案

查看cor(即?cor)的帮助文件.尤其是

Look at the help file for cor, i.e. ?cor. In particular,

如果用途"是一切",则"NA"将在概念上传播,即 只要其贡献之一,则结果值将为"NA" 观察结果为不适用".

If ‘use’ is ‘"everything"’, ‘NA’s will propagate conceptually, i.e., a resulting value will be ‘NA’ whenever one of its contributing observations is ‘NA’.

如果使用"为所有.obs"",则表示缺少观察结果 会产生一个错误.如果使用"为"complete.obs"",则丢失 值通过大小写删除来处理(如果没有完整的值, 情况下,会出现错误.)

If ‘use’ is ‘"all.obs"’, then the presence of missing observations will produce an error. If ‘use’ is ‘"complete.obs"’ then missing values are handled by casewise deletion (and if there are no complete cases, that gives an error).

要更好地了解正在发生的事情,就是创建一个(甚至)更简单的示例:

To get a better feel about what is going on, is to create an (even) simpler example:

df1 = df[1:5,1:3]
cor(df1, use="pairwise.complete.obs", method="pearson") 
cor(df1, use="complete.obs", method="pearson") 
cor(df1[3:5,], method="pearson") 

因此,当使用complete.obs时,如果存在NA,则会丢弃 entire 行.在我的示例中,这意味着我们丢弃第1行和第2行.但是,pairwise.complete.obs在计算V1V2之间的相关性时使用非NA值.

So, when we use complete.obs, we discard the entire row if an NA is present. In my example, this means we discard rows 1 and 2. However, pairwise.complete.obs uses the non-NA values when calculating the correlation between V1 and V2.

这篇关于cor()函数的complete.obs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆