R cor(),方法=“皮尔逊”。返回NA,但是方法=“ spearman”。返回值。为什么? [英] R cor(), method="pearson" returns NA, but method="spearman" returns value. Why?

查看:608
本文介绍了R cor(),方法=“皮尔逊”。返回NA,但是方法=“ spearman”。返回值。为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R在大约10,000 x 15,000(事件x样本)维的非常大的数据矩阵上运行相关性。该数据集包含范围为-15:15,NA,NaN,inf和-inf的浮点值。为了简化该问题,我选择一次处理矩阵的两行,将它们称为vector1,vector2。命令如下:

I am using R to run correlations on a very large data matrix with approximate dimension 10,000 x 15,000 (events x samples). This data set contains floating point values ranging from -15:15, NA, NaN, inf, and -inf. To simplify the problem I have chosen to work with two rows of my matrix at a time, call them vector1, vector2. The commands are written below:

CorrelationSpearman = cor(vector1,vector2, method="spearman",use="pairwise.complete.obs")
CorrelationPearson = cor(vector1,vector2,method="pearson",use="pairwise.complete.obs")

对于矩阵中的大多数但不是全部行向量,我得到CorrelationPearson = NA。 CorrelationSpearman值似乎没有问题。我检查了矩阵尺寸是否正确,并且对可以正常工作的较小数据进行了测试。发生这种情况的可能原因有哪些?

For most but not all row vectors in my matrix, I get CorrelationPearson=NA. There seems to be no problem with with CorrelationSpearman values. I have checked that the matrix dimensions are correct, and I've run tests on smaller data which work fine. What are some possible reasons why this occurs?

推荐答案

皮尔逊相关系数依赖于估计均值和(协)方差。
无限值导致无限均值和无限方差,从而破坏计算。
Spearman Kendall 相关系数是基于排名的,因此可以无穷大地处理排序(但要注意样本中的捆绑值!)。

The Pearson correlation coefficient relies on estimating means and (co)variance. Infinite values lead to infinite means and infinite variances, which break computations. Spearman and Kendall correlation coefficients are rank-based, and thus handle sorting just fine with infinite values (but beware of tied values in your samples!).

尝试:

> lix <- is.infinite(vector1) | is.infinite(vector2)
> cor(vector1[!lix], vector2[!lix], method = "pearson", use = "pairwise.complete.obs")

这只会拔出任何具有无限值的对。
要更一般地执行此操作,可以使用如下功能:

This just plucks out any pair with infinite values. To do this more generally, a function like this is helpful:

> inf2NA <- function(x) { x[is.infinite(x)] <- NA; x }
> cor(inf2NA(vector1), inf2NA(vector2), ...)

只是转换无限值到NA,然后您的 use 参数可以处理您认为合适的NA情况。

which just converts infinite values to NAs, and then your use argument can handle those NA cases as you see fit.

这篇关于R cor(),方法=“皮尔逊”。返回NA,但是方法=“ spearman”。返回值。为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆