数据帧R中值的组合计数 [英] counts of combinations of values in a dataframe R

查看：66 发布时间：2020/10/5 22:02:13 r dataframe combinations

本文介绍了数据帧R中值的组合计数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个像这样的数据框：

  df< -structure（list（id = c（ A， A， A， B， B， C， C， D，
 D， E， E），专业知识= c（ r ， python， julia， python，
 r， python， julia， python， julia， r， julia））），类= c（ tbl_df，
 tbl， data.frame），row.names = c（NA，-11L），.Names = c（ id，
 expertise） ，spec = structure（list（cols = structure（list（id = structure（list（），class = c（ collector_character，
 collector））），专门知识= structure（list（），class = c （ collector_character，
 collector））））。.names = c（ id， expertise））），默认=结构（list（），类= c（ collector_guess，
 collector））））.names = c（ cols， default），class = col_spec））
 
 df 
 id专业知识
 1 A r 
 2 A蟒蛇
 3朱莉亚
 4 B蟒蛇
 5 B r 
 6 C蟒蛇
 7 C朱莉娅
 8 D蟒蛇
 9 D朱莉娅
 10 E r 
 11 E朱莉娅

我可以使用以下方法获得专业知识的总数：

  library（dplyr）
 df％>％group_by（expertise）％>％mutate（counts_overall = n（））

但是我想要的是专业知识值组合的计数。换句话说，有多少 id具有两种专业知识的相同组合，例如 r和 julia？
这是所需的输出：

  df_out< -structure（list（expertise1 = c（ r， r， python），专业2 = c（ python，
 julia， julia），count = c（2L，2L，3L）），class = c（ tbl_df，
 tbl， data.frame），row.names = c（NA，-3L），.Names = c（ expertise1，
 expertise2， count），spec =结构（列表（列（cols =结构（列表（
的专业知识1 =结构（列表（），类= c（ collector_character，
 collector）））），专业知识2 =结构（列表（），类别= c（ collector_character，
 collector））），count =结构（list（），类= c（ collector_integer，
 collector））））。.Names = c（ expertise1 ， expertise2， count 
）），默认=结构（list（），类= c（ collector_guess，
 collector）））），.Names = c（ cols ，默认），类= col_spec））
 
 df_out 
专长1专长2 count 
 1 r python 2 
 2 r julia 2 
 3 Python朱莉娅3

解决方案

链接的答案来自<一个href = https://stackoverflow.com/questions/51923115/counts-of-combinations-of-values-in-a-dataframe-r#comment90796269_51923115> latemail的评论创建一个矩阵

  crossprod（table（df）> 0）

 专业知识
专业知识julia python r 
朱莉娅4 3 2 
 python 3 4 2 
r 2 2 3

，而OP希望使用长格式的数据帧。

1）交叉连接

下面是 data.table 解决方案，该解决方案使用 CJ（）（ cross join ）函数：

  library（data.table）
 setDT（df）[，CJ（专业知识，专业知识）[V1< V2]，按= id] [
，.N，按=。（专业1 = V1，专业知识2 = V2）]

 专长1专长2 N 
 1：朱莉娅python 3 
 2：朱莉娅r 2 
 3：python r 2

CJ（专业知识）[V1< V2] 是 t（combn（df $ expertise，2）） data.table $ c>或 combinat :: combn2（df $ expertise）。

2）自加入

这里是使用 self-join 的另一个变体：

  library（data.table）
 setDT（df）[df，on = id，allow = TRUE] [
专业知识< i.expertise，.N，按=。（（专家1 =专业知识，专业2 = i。专业知识）]

 专长1专长2 N 
 1：蟒蛇r 2 
 2：朱莉娅r 2 
 3：朱莉娅蟒蛇3

I have a dataframe like so:

    df<-structure(list(id = c("A", "A", "A", "B", "B", "C", "C", "D", 
"D", "E", "E"), expertise = c("r", "python", "julia", "python", 
"r", "python", "julia", "python", "julia", "r", "julia")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -11L), .Names = c("id", 
"expertise"), spec = structure(list(cols = structure(list(id = structure(list(), class = c("collector_character", 
"collector")), expertise = structure(list(), class = c("collector_character", 
"collector"))), .Names = c("id", "expertise")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

df
   id expertise
1   A         r
2   A    python
3   A     julia
4   B    python
5   B         r
6   C    python
7   C     julia
8   D    python
9   D     julia
10  E         r
11  E     julia

I can get the overall counts of "expertise" by using:

library(dplyr)    
df %>% group_by(expertise) %>% mutate (counts_overall= n())

However what I want is the counts for combinations of expertise values. In other words how many "id" had the same combination of two expertise e.g. "r" and"julia"? Here is a desired output:

df_out<-structure(list(expertise1 = c("r", "r", "python"), expertise2 = c("python", 
"julia", "julia"), count = c(2L, 2L, 3L)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -3L), .Names = c("expertise1", 
"expertise2", "count"), spec = structure(list(cols = structure(list(
    expertise1 = structure(list(), class = c("collector_character", 
    "collector")), expertise2 = structure(list(), class = c("collector_character", 
    "collector")), count = structure(list(), class = c("collector_integer", 
    "collector"))), .Names = c("expertise1", "expertise2", "count"
)), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

df_out
  expertise1 expertise2 count
1          r     python     2
2          r      julia     2
3     python      julia     3

解决方案

The linked answer from latemail's comment creates a matrix

crossprod(table(df) > 0)

         expertise
expertise julia python r
   julia      4      3 2
   python     3      4 2
   r          2      2 3

while the OP expects a dataframe in long format.

1) cross join

Below is a data.table solution which uses the CJ() (cross join) function:

library(data.table)
setDT(df)[, CJ(expertise, expertise)[V1 < V2], by = id][
  , .N, by = .(expertise1 = V1, expertise2 = V2)]

   expertise1 expertise2 N
1:      julia     python 3
2:      julia          r 2
3:     python          r 2

CJ(expertise, expertise)[V1 < V2] is the data.table equivalent for t(combn(df$expertise, 2)) or combinat::combn2(df$expertise).

2) self-join

Here is another variant which uses a self-join:

library(data.table)
setDT(df)[df, on = "id", allow = TRUE][
  expertise < i.expertise, .N, by = .(expertise1 = expertise, expertise2 = i.expertise)]

   expertise1 expertise2 N
1:     python          r 2
2:      julia          r 2
3:      julia     python 3

这篇关于数据帧R中值的组合计数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

数据帧R中值的组合计数 [英] counts of combinations of values in a dataframe R

问题描述

1）交叉连接

2）自加入

1) cross join

2) self-join

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

数据帧R中值的组合计数 [英] counts of combinations of values in a dataframe R

问题描述

1）交叉连接

2）自加入

1) cross join

2) self-join

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭