Spearman相关性和联系 [英] Spearman correlation and ties

查看:115
本文介绍了Spearman相关性和联系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在根据少量成对排名计算Spearman的rho。
Spearman因无法正确处理领带而闻名。例如,以2组为8个排名,即使两组中有6个是平局,相关性仍然很高:

 > cor.test(c(1,2,3,4,5,6,7,8),c(0,0,0,0,0,0,7,8),method = spearman)

斯皮尔曼等级相关系数rh

S = 19.8439,p值= 0.0274

样本估计值:
rho
0.7637626

警告消息:
无法计算具有联系的确切p值

对于该数据,p值<.05似乎具有很高的统计意义。
R中是否有Spearman的关系校正版本?
到目前为止,通过许多联系来计算它的最佳公式是什么?

解决方案

嗯, Kendall tau等级相关性 还是一种非参数检验,用于检验两个有序(或等级转换)变量之间的统计依赖性,例如Spearman变量,但与Spearman变量不同,可以处理关系



更具体地说,肯德尔tau统计数据有3个,分别是tau-a,tau-b和tau-c 。 tau-b特别适合处理关系。



tau-b统计信息处理关系(即,两者对的成员具有相同的序数值)除数项,该除数项表示未绑定在x上的对的数量和未绑定在y上的对的数量之间的几何平均值。



<肯德尔的牛头不是斯皮尔曼的牛头-它们不一样,但是也很相似。您必须根据上下文确定两者是否足够相似,以至于可以替代另一个。



例如, tau-b

  Kendall_tau_b =(P-Q)/((P + Q + Y0)*(P + Q + X0))^ 0.5 

P 协和数对(一致表示该对数据点的每个成员的等级都同意)



Q 不一致对



X0 :未绑定在x上的对数



Y0 :未绑定y的对的数量



事实上,有一个Spearman的rho变体,明确说明了领带。在需要非参数等级相关统计的情况下,我总是选择tau而不是rho。原因是 rho求和平方误差,而 tau求和绝对的
差异
。鉴于tau和rho都是能胜任的统计数据,我们任由选择,因此在我看来,对差异(tau)的线性惩罚似乎一直是一种表达排名相关性的更自然的方法。这不是一个建议,您的情况可能会大不相同,请另作规定。


I'm computing Spearman's rho on small sets of paired rankings. Spearman is well known for not handling ties properly. For example, taking 2 sets of 8 rankings, even if 6 are ties in one of the two sets, the correlation is still very high:

> cor.test(c(1,2,3,4,5,6,7,8), c(0,0,0,0,0,0,7,8), method="spearman")

    Spearman's rank correlation rho

S = 19.8439, p-value = 0.0274

sample estimates:
      rho 
0.7637626 

Warning message:
 Cannot compute exact p-values with ties

The p-value <.05 seems like a pretty high statistical significance for this data. Is there a ties-corrected version of Spearman in R? What is the best formula to date to compute it with a lot of ties?

解决方案

Well, Kendall tau rank correlation is also a non-parametric test for statistical dependence between two ordinal (or rank-transformed) variables--like Spearman's, but unlike Spearman's, can handle ties.

More specifically, there are three Kendall tau statistics--tau-a, tau-b, and tau-c. tau-b is specifically adapted to handle ties.

The tau-b statistic handles ties (i.e., both members of the pair have the same ordinal value) by a divisor term, which represents the geometric mean between the number of pairs not tied on x and the number not tied on y.

Kendall's tau is not Spearman's--they are not the same, but they are also quite similar. You'll have to decide, based on context, whether the two are similar enough such one can be substituted for the other.

For instance, tau-b:

Kendall_tau_b = (P - Q) / ( (P + Q + Y0)*(P + Q + X0) )^0.5

P: number of concordant pairs ('concordant' means the ranks of each member of the pair of data points agree)

Q: number of discordant pairs

X0: number of pairs not tied on x

Y0: number of pairs not tied on y

There is in fact a variant of Spearman's rho that explicitly accounts for ties. In situations in which i needed a non-parametric rank correlation statistic, i have always chosen tau over rho. The reason is that rho sums the squared errors, whereas tau sums the absolute discrepancies. Given that both tau and rho are competent statistics and we are left to choose, a linear penalty on discrepancies (tau) has always seemed to me, a more natural way to express rank correlation. That's not a recommendation, your context might be quite different and dictate otherwise.

这篇关于Spearman相关性和联系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆