如何从长格式数据框计算成对计数表 [英] How to calculate a table of pairwise counts from long-form data frame

查看:130
本文介绍了如何从长格式数据框计算成对计数表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个长格式数据框,列为 id (主键)和 featureCode 分类变量)。每个记录具有1到9个分类变量的值。例如:

I have a 'long-form' data frame with columns id (the primary key) and featureCode (categorical variable). Each record has between 1 and 9 values of the categorical variable. For example:

id  featureCode
5   PPLC
5   PCLI
6   PPLC
6   PCLI
7   PPL
7   PPLC
7   PCLI
8   PPLC
9   PPLC
10  PPLC

我想计算每个功能代码与其他功能代码(标题的成对计数)一起使用的次数。在这个阶段,使用每个功能代码的顺序并不重要。我预计结果将是另一个数据框架,行和列是特征码,单元格是计数。例如:

I'd like to calculate the number of times each feature code is used with the other feature codes (the "pairwise counts" of the title). At this stage, the order each feature code is used is not important. I envisage the result would be another data frame, where the rows and columns are feature codes, and the cells are counts. For example:

      PPLC  PCLI  PPL
PPLC  0     3     1
PCLI  3     0     1
PPL   1     1     0

不幸的是,我不知道如何执行这个计算,我画了在搜索意见时是空白的(主要是我怀疑,因为我不知道正确的术语)。

Unfortunately, I don't know how to perform this calculation and I've drawn a blank when searching for advice (mostly, I suspect, because I don't know the correct terminology).

推荐答案

一个 data.table 类似于@mrdwab的方法

Here is a data.table approach similar to @mrdwab

如果 featureCode 是一个字符

library(data.table)

DT <- data.table(dat)
# convert to character
DT[, featureCode := as.character(featureCode)]
# subset those with >1 per id
DT2 <- DT[, N := .N, by = id][N>1]
# create all combinations of 2
# return as a data.table with these as columns `V1` and `V2`
# then count the numbers in each group
DT2[, rbindlist(combn(featureCode,2, 
      FUN = function(x) as.data.table(as.list(x)), simplify = F)), 
    by = id][, .N, by = list(V1,V2)]


     V1   V2 N
1: PPLC PCLI 3
2:  PPL PPLC 1
3:  PPL PCLI 1

这篇关于如何从长格式数据框计算成对计数表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆