r计算组中元素的组合 [英] r count combinations of elements in groups

查看:107
本文介绍了r计算组中元素的组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望计算两个元素的每种组合在同一组中出现的次数。

I wish to count the number of times each combination of two elements appears in the same group.

例如,使用:

> dat = data.table(group = c(1,1,1,2,2,2,3,3), id=c(10,11,12,10,11,13,11,13))
> dat
   group id
1:     1 10
2:     1 11
3:     1 12
4:     2 10
5:     2 11
6:     2 13
7:     3 11
8:     3 13

预期结果将是:

id.1  id.2  nb_common_appearances
10    11    2                      (in group 1 and 2)
10    12    1                      (in group 1)
11    12    1                      (in group 1)
10    13    1                      (in group 2)
11    13    2                      (in group 2 and 3)


推荐答案

这是数据.table 方法(与 plyr 中的@josilber大致相同):

Here is a data.table approach (roughly the same as @josilber's from plyr):

pairs <- dat[, c(id=split(combn(id,2),1:2)), by=group ]
pairs[, .N, by=.(id.1,id.2) ]
#    id.1 id.2 N
# 1:   10   11 2
# 2:   10   12 1
# 3:   11   12 1
# 4:   10   13 1
# 5:   11   13 2

您还可以考虑在中查看结果:

You might also consider viewing the results in a table:

pairs[, table(id.1,id.2) ]
#     id.2
# id.1 11 12 13
#   10  2  1  1
#   11  0  1  2

您可以使用合并而不是 combn

You can use merges instead of combn:

setkey(dat, group)
dat[ dat, allow.cartesian=TRUE ][ id<i.id, .N, by=.(id,i.id) ]






基准。对于大数据,合并可能会更快一些(由@DavidArenburg假设)。 @Arun的答案仍然更快:


Benchmarks. For large data, the merges can be a little faster (as hypothesized by @DavidArenburg). @Arun's answer is faster still:

DT <- data.table(g=1,id=1:(1.5e3),key="id")
system.time({a <- combn(DT$id,2)})
#    user  system elapsed
#    0.81    0.00    0.81
system.time({b <- DT[DT,allow.cartesian=TRUE][id<i.id]})
#    user  system elapsed
#    0.13    0.00    0.12
system.time({d <- DT[,.(rep(id,(.N-1L):0L),id[indices(.N-1L)])]})
#    user  system elapsed
#    0.01    0.00    0.02

(我省略了分组操作,因为我没有

(I left out the group-by operation as I don't think it will be important to the timings.)

捍卫梳理。 code> combn 方法可以很好地扩展到更大的组合,而合并和@Arun的答案虽然对更快,但不要(据我所知):

In defense of combn. The combn approach extends nicely to larger combos, while merges and @Arun's answer, while much faster for pairs, do not (as far as I can see):

DT2        <- data.table(g=rep(1:2,each=5),id=1:5)  
tuple_size <- 4

tuples <- DT2[, c(id=split(combn(id,tuple_size),1:tuple_size)), by=g ]
tuples[, .N, by=setdiff(names(tuples),"g")]    
#    id.1 id.2 id.3 id.4 N
# 1:    1    2    3    4 2
# 2:    1    2    3    5 2
# 3:    1    2    4    5 2
# 4:    1    3    4    5 2
# 5:    2    3    4    5 2

这篇关于r计算组中元素的组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆