填充“计数矩阵".具有R data.table行的排列 [英] Populating a "count matrix" with permutations of R data.table rows

查看:106
本文介绍了填充“计数矩阵".具有R data.table行的排列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(下面,我可以使用R data.frame或R data.table.都可以.)

(For the following, I could either an R data.frame or R data.table. Both are ok.)

我有以下data.table:

I have the following data.table:

library(data.table)

dt = data.table(V1=c("dog", "dog", "cat", "cat", "cat", "bird","bird","bird","bird"), 
                    V2=rep(42, 9), V3=c(1, 2, 4, 5, 7, 1, 2, 5, 8)) 

> print(dt)
     V1 V2 V3
1:  dog 42  1
2:  dog 42  2
3:  cat 42  4
4:  cat 42  5
5:  cat 42  7
6: bird 42  1
7: bird 42  2
8: bird 42  5
9: bird 42  8

V3包含从1到8的整数.我的目标是给定列V1

Column V3 contains integers from 1 to 8. My goal is to populate an 8 by 8 zero matrix with the count of each combination "pair" given the unique category in column V1

因此,dogcatbird的组合对为:

So, the combination pairs for dog, cat, and bird are:

dog: (1, 2)
cat: (4, 5), (4, 7), (5, 7)
bird: (1, 2), (1, 5), (1, 8), (2, 5), (2, 8), (5, 8)

对于每对,我将+1添加到零矩阵中的相应条目.对于此矩阵,(n, m) = (m, n).给出dt的矩阵为:

For each pair, I add +1 to the corresponding entry in the zero matrix. For this matrix, (n, m) = (m, n). The matrix given dt would be:

   1 2 3 4 5 6 7 8
1: 0 2 0 0 1 0 0 1
2: 2 0 0 0 1 0 0 1
3: 0 0 0 0 0 0 0 0
4: 0 0 0 0 1 0 1 0
5: 1 1 0 1 0 0 1 1
6: 0 0 0 0 0 0 0 0
7: 0 0 0 1 1 0 0 0
8: 1 1 0 0 1 0 0 0

请注意,(1,2)=(2,1)的计数为2,来自dog组合和bird组合.

Note that (1,2)=(2,1) has a count 2, from the dog combination and the bird combination.

(1)在给定另一列的唯一值的情况下,是否有一种方法可以计算R data.table/data.frame列中的值组合?

(1) Is there a method to calculate the combinations of values in an R data.table/data.frame column, given the unique value in another column?

输出带有向量成对"的R列表(例如,

Perhaps it would make sense to output an R list, with vector "pairs", e.g.

list(c(1, 2), c(2, 1), c(4, 5), c(4, 7), c(5, 7), c(5, 4), c(7, 4), c(7, 5),
    c(1, 2), c(1, 5), c(1, 8), c(2, 5), c(2, 8), c(5, 8), c(2, 1), c(5, 1),
    c(8, 1), c(5, 2), c(8, 2), c(8, 5))

但是,我不确定如何使用它来填充矩阵...

However, I'm not sure how I would use this to populate a matrix...

(2)在输入data.table/data.frame的基础上,最上面写出矩阵的最有效数据结构是什么?

(2) Given the input data.table/data.frame, what would be the most efficient data-structure to use to write out a matrix, as soon above?

推荐答案

这是一个似乎有效的data.table解决方案.基本上,我们进行自我连接以创建组合然后进行计数.然后,类似于使用@numpy进行@coldspeed一样,我们将仅通过具有计数的位置来更新零矩阵.

Here's a data.table solution that seems to be efficient. We basically doing a self join in order to create combinations and then count. Then, similar to what @coldspeed done with Numpy, we will just update a zero matrix by locations with counts.

# a self join
tmp <- dt[dt, 
             .(V1, id = x.V3, id2 = V3), 
             on = .(V1, V3 < V3), 
             nomatch = 0L,
             allow.cartesian = TRUE
          ][, .N, by = .(id, id2)]

## Create a zero matrix and update by locations
m <- array(0L, rep(max(dt$V3), 2L))
m[cbind(tmp$id, tmp$id2)] <- tmp$N
m + t(m)

#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,]    0    2    0    0    1    0    0    1
# [2,]    2    0    0    0    1    0    0    1
# [3,]    0    0    0    0    0    0    0    0
# [4,]    0    0    0    0    1    0    1    0
# [5,]    1    1    0    1    0    0    1    1
# [6,]    0    0    0    0    0    0    0    0
# [7,]    0    0    0    1    1    0    0    0
# [8,]    1    1    0    0    1    0    0    0


或者,我们可以使用data.table::CJ创建tmp,但是(可能是由于@Frank的提示)可能会降低内存效率,因为它将首先创建所有可能的组合,例如


Alternatively, we could create tmp using data.table::CJ but that could be (potentially - thanks to @Frank for the tip) less memory efficient as it will create all possible combinations first, e.g.

tmp <- dt[, CJ(V3, V3)[V1 < V2], by = .(g = V1)][, .N, by = .(V1, V2)]

## Then, as previously
m <- array(0L, rep(max(dt$V3), 2L))
m[cbind(tmp$V1, tmp$V2)] <- tmp$N
m + t(m)

#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,]    0    2    0    0    1    0    0    1
# [2,]    2    0    0    0    1    0    0    1
# [3,]    0    0    0    0    0    0    0    0
# [4,]    0    0    0    0    1    0    1    0
# [5,]    1    1    0    1    0    0    1    1
# [6,]    0    0    0    0    0    0    0    0
# [7,]    0    0    0    1    1    0    0    0
# [8,]    1    1    0    0    1    0    0    0

这篇关于填充“计数矩阵".具有R data.table行的排列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆