R中的关联度量-肯德尔的tau-b和tau-c [英] Measures of association in R -- Kendall's tau-b and tau-c

查看:517
本文介绍了R中的关联度量-肯德尔的tau-b和tau-c的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有用于计算Kendall的tau-b和tau-c及其相关标准误差的R包?我在Google和Rseek上的搜索没有任何结果,但是肯定有人在R中实现了这些。

Are there any R packages for the calculation of Kendall's tau-b and tau-c, and their associated standard errors? My searches on Google and Rseek have turned up nothing, but surely someone has implemented these in R.

推荐答案

三个 Kendall tau统计信息 tau-a tau-b tau-c ) 。

它们是不可互换的,到目前为止,发布的答案都没有涉及后两个问题,这是OP的主题。

They are not interchangeable, and none of the answers posted so far deal with the last two, which is the subject of the OP's question.

我无法在R 标准库 stat中找到用于计算tau-b或tau-c的函数等。)或CRAN或其他存储库上可用的任何软件包中。我使用了出色的R包 sos 进行搜索,所以我相信返回的结果是相当全面的。

I was unable to find functions to calculate tau-b or tau-c, either in the R Standard Library (stat et al.) or in any of the Packages available on CRAN or other repositories. I used the excellent R Package sos to search, so i believe results returned were reasonably thorough.

因此,这是对OP的简短回答:没有针对tau-b或tau-c的内置或打包功能

So that's the short answer to the OP's Question: no built-in or Package function for tau-b or tau-c.

但是您可以轻松地自己滚动。

But it's easy to roll your own.

为Kendall统计信息编写R函数只是
的问题,可以将这些等式转换为代码:

Writing R functions for the Kendall statistics is just a matter of translating these equations into code:

Kendall_tau_a = (P - Q) / (n * (n - 1) / 2)

Kendall_tau_b = (P - Q) / ( (P + Q + Y0) * (P + Q + X0) ) ^ 0.5 

Kendall_tau_c = (P - Q) * ((2 * m) / n ^ 2 * (m - 1) )

tau-a:等于一致减负不一致对,除以一个因子即可得出合计对的数量(样本大小)。

tau-a: equal to concordant minus discordant pairs, divided by a factor to account for total number of pairs (sample size).

tau-b:明确说明关系-即两者数据对的成员具有相同的值;此值等于一致减负不和谐对除以项,该项表示x(X0)上未绑定对的数量和y(Y0)上未绑定的对之间的几何平均值。

tau-b: explicit accounting for ties--i.e., both members of the data pair have the same value; this value is equal to concordant minus discordant pairs divided by a term representing the geometric mean between the number of pairs not tied on x (X0) and the number not tied on y (Y0).

tau-c: 大表变体也针对非正方形表进行了优化;等于一致减负不和谐对乘以可调整表格大小的因子。)

tau-c: larger-table variant also optimized for non-square tables; equal to concordant minus discordant pairs multiplied by a factor that adjusts for table size).

# Number of concordant pairs.
P = function(t) {
  r_ndx = row(t)
  c_ndx = col(t)
  sum(t * mapply(function(r, c){sum(t[(r_ndx > r) & (c_ndx > c)])},
    r = r_ndx, c = c_ndx))
}

# Number of discordant pairs.
Q = function(t) {
  r_ndx = row(t)
  c_ndx = col(t)
  sum(t * mapply( function(r, c){
      sum(t[(r_ndx > r) & (c_ndx < c)])
  },
    r = r_ndx, c = c_ndx) )
}

# Sample size (total number of pairs).
n = n = sum(t)

# The lesser of number of rows or columns.
m = min(dim(t))

这四个参数是您所需要的计算 tau-a tau-b tau-c

So these four parameters are all you need to calculate tau-a, tau-b, and tau-c:


  • P

Q

m

n

(加上 tau-b的 XO Y0

(plus XO & Y0 for tau-b)

例如, tau-c 的代码为:

kendall_tau_c = function(t){
    t = as.matrix(t) 
    m = min(dim(t))
    n = sum(t)
    ks_tauc = (m * 2 * (P(t) - Q(t))) / ((n ^ 2) * (m - 1))
}

那么Kendall的tau统计信息与其他

So how are Kendall's tau statistics related to the other statistical tests used in categorical data analysis?

所有三个Kendall tau统计数据以及Goodman和Kruskal的 gamma 均用于 co序数和二进制数据的相关性。 (肯德尔tau统计量是伽玛统计量(只是PQ)的更复杂的替代方法。)

All three Kendall tau statistics, along with Goodman's and Kruskal's gamma are for correlation of ordinal and binary data. (The Kendall tau statistics are more sophisticated alternatives to the gamma statistic (just P-Q).)

因此,肯德尔斯的 tau gamma 是简单的 chi-square Fisher精确检验的对应物,据我所知,这两种都仅适用于 nominal数据

And so Kendalls's tau and the gamma are counterparts to the simple chi-square and Fisher's exact tests, both of which are (as far as I know) suitable only for nominal data.

示例:

cpa_group = c(4, 2, 4, 3, 2, 2, 3, 2, 1, 5, 5, 1)
revenue_per_customer_group = c(3, 3, 1, 3, 4, 4, 4, 3, 5, 3, 2, 2)
weight = c(1, 3, 3, 2, 2, 4, 0, 4, 3, 0, 1, 1)

dfx = data.frame(CPA=cpa_group, LCV=revenue_per_customer_group, freq=weight)

# Reshape data frame so 1 row for each event 
# (predicate step to create contingency table).
dfx2 = data.frame(lapply(dfx, function(x) { rep(x, dfx$freq)}))

t = xtabs(~ revenue + cpa, dfx)

kc = kendall_tau_c(t)

# Returns -.35.

这篇关于R中的关联度量-肯德尔的tau-b和tau-c的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆