为什么 Pearson 相关输出是 NaN? [英] Why Pearson correlation output is NaN?

查看:65
本文介绍了为什么 Pearson 相关输出是 NaN?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取 R 中变量之间的 Pearson 相关系数.这是变量的散点图:

I'm trying to get the Pearson correlation coefficient between to variables in R. This is the scatterplot of the variables:

ggplot(results_summary, aes(x =D_in, y = D_ex)) + geom_point(col=ifelse(results_summary$FDR < 0.05, ifelse(results_summary$logF>0, "red", "green" ), "black"))

如您所见,变量之间的相关性非常好,因此我期望相关系数很高.但是,当我尝试获得 Pearson 相关系数时,我得到的是 NaN!

As you can see, the variables correlate pretty well, so I'm expecting a high correlation coefficient. However when I try to get the Pearson correlation coefficient I'm getting a NaN!

> cor(results_summary$D_in, results_summary$D_ex, method="spearman")
[1] 0.868079
> cor(results_summary$D_in, results_summary$D_ex, method="kendall")
[1] 0.6973086
> cor(results_summary$D_in, results_summary$D_ex, method="pearson")
[1] NaN

我检查了我的数据是否包含任何 NaN:

I checked if my data contains any NaN:

> nrow(subset(results_summary, is.nan(results_summary$D_ex)==TRUE)) 
[1] 0
> nrow(subset(results_summary, is.nan(results_summary$D_in)==TRUE)) 
[1] 0
> cor(results_summary$D_in, results_summary$D_ex, method="pearson", use="complete.obs")
[1] NaN

但这似乎不是产生 NaN 的原因.有人可以提供有关这里可能发生的事情的任何线索吗?

But it's seems that is not the reason of the resulting NaN. Can some one give any clue about what is might happening here?

感谢您的时间!

推荐答案

这似乎很奇怪.我的猜测是输入数据存在一些问题(您提到的检查未显示).我建议你跑步:

That seems odd. My guess is that there is some problem with the input data (which was not revealed by the check you mentioned). I suggest you running:

any(!is.finite(results_summary$D_in))

any(!is.finite(results_summary$D_in))

any(!is.finite(results_summary$D_ex))

any(!is.finite(results_summary$D_ex))

您也可以尝试手动计算 Pearson 相关性,以尝试了解问题出在哪里(在分子和/或分母中?):

You could also try calculating Pearson's correlation by hand, to try to get some insight on where the problem is (in the numerator and/or denominator?):

pearson_num = cov(results_summary$D_in, results_summary$D_ex, use="complete.obs")

pearson_num = cov(results_summary$D_in, results_summary$D_ex, use="complete.obs")

pearson_den = c(sd(results_summary$D_in), sd(results_summary$D_ex))

pearson_den = c(sd(results_summary$D_in), sd(results_summary$D_ex))

这篇关于为什么 Pearson 相关输出是 NaN?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆