R:将列联表转换为长数据框 [英] R: Convert contingency table to long data.frame
问题描述
考虑如下所示的汇总交叉表:
Consider you are given a summarized crosstable like this:
kdat <- data.frame(positive = c(8, 4), negative = c(3, 6),
row.names = c("positive", "negative"))
kdat
#> positive negative
#> positive 8 3
#> negative 4 6
现在您要计算科恩的Kappa,该统计数据可确定两个评估者之间的协议。给定这种格式的数据,可以使用 psych :: cohen.kappa
:
Now you want to compute Cohen's Kappa, a statistic to determine the agreement between two raters. Given data in this format, you can use psych::cohen.kappa
:
psych::cohen.kappa(kdat)$kappa
#> Warning in any(abs(bounds)): coercing argument of type 'double' to logical
#> [1] 0.3287671
这让我很恼火,因为我更喜欢我的数据又细又长,让我使用 irr :: kappa2
。由于种种原因,我更喜欢类似的功能。所以我组装了此函数以重新格式化我的数据:
Which irks me, because I prefer my data to be long and thin, which would let me use irr::kappa2
. A similar function that I prefer for arbitrary reasons. So I assembled this function to reformat my data:
longify_xtab <- function(x) {
nm <- names(x)
# Convert to table
x_tab <- as.table(as.matrix(x))
# Just in case there are now rownames, required for conversion
rownames(x_tab) <- nm
# Use appropriate method to get a df
x_df <- as.data.frame(x_tab)
# Restructure df in a painful and unsightly way
data.frame(lapply(x_df[seq_len(ncol(x_df) - 1)], function(col) {
rep(col, x_df$Freq)
}))
}
该函数返回以下格式:
longify_xtab(kdat)
#> Var1 Var2
#> 1 positive positive
#> 2 positive positive
#> 3 positive positive
#> 4 positive positive
#> 5 positive positive
#> 6 positive positive
#> 7 positive positive
#> 8 positive positive
#> 9 negative positive
#> 10 negative positive
#> 11 negative positive
#> 12 negative positive
#> 13 positive negative
#> 14 positive negative
#> 15 positive negative
#> 16 negative negative
#> 17 negative negative
#> 18 negative negative
#> 19 negative negative
#> 20 negative negative
#> 21 negative negative
...让我们通过 irr计算Kappa: kappa2
:
irr::kappa2(longify_xtab(kdat))$value
#> [1] 0.3287671
我的问题是:
是否有更好的方法(在基本R中或与包一起使用)?它使我觉得这是一个相对简单的问题,但是通过尝试解决它,我意识到至少在我脑海中,它很棘手。
My question is:
Is there a better way to do this (in base R or with a package)? It strikes me as a relatively simple issue, but by trying to solve it I realized that it's oddly tricky, at least in my head.
推荐答案
以下是一些公共领域的代码,来自: http://www.cookbook-r。 com / Manipulating_data / Converting_between_data_frames_and_contingency_tables / ,我曾经完全按照您的要求进行操作。
Here is some public domain code from: http://www.cookbook-r.com/Manipulating_data/Converting_between_data_frames_and_contingency_tables/ which I have used to do exactly what you have asked.
# Convert from data frame of counts to data frame of cases.
# `countcol` is the name of the column containing the counts
countsToCases <- function(x, countcol = "Freq") {
# Get the row indices to pull from x
idx <- rep.int(seq_len(nrow(x)), x[[countcol]])
# Drop count column
x[[countcol]] <- NULL
# Get the rows from x
x[idx, ]
}
这篇关于R:将列联表转换为长数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!