R - 两个 data.tables 的匹配值的索引 [英] R - indices of matching values of two data.tables
问题描述
这是我在 StackOverflow 上的第一篇文章.我相对来说是编程新手,并试图在 R 中使用 data.table,因为它在速度方面享有盛誉.
This is my first post at StackOverflow. I am relatively a newbie in programming and trying to work with the data.table in R, for its reputation in speed.
我有一个非常大的 data.table,名为Actions",有 5 列,可能有几百万行.列名是 k1、k2、i、l1 和 l2.我有另一个 data.table,在列 k1 和 k2 中具有唯一的 Actions 值,名为States".
I have a very large data.table, named "Actions", with 5 columns and potentially several million rows. The column names are k1, k2, i, l1 and l2. I have another data.table, with the unique values of Actions in columns k1 and k2, named "States".
对于 Actions 中的每一行,我想找到第 4 列和第 5 列的唯一索引,与状态匹配.可重现的代码如下:
For every row in Actions, I would like to find the unique index for columns 4 and 5, matching with States. A reproducible code is as follows:
S.disc <- c(2000,2000)
S.max <- c(6200,2300)
S.min <- c(700,100)
Traces.num <- 3
Class.str <- lapply(1:2,function(x) seq(S.min[x],S.max[x],S.disc[x]))
Class.inf <- seq_len(Traces.num)
Actions <- data.table(expand.grid(Class.inf, Class.str[[2]], Class.str[[1]], Class.str[[2]], Class.str[[1]])[,c(5,4,1,3,2)])
setnames(Actions,c("k1","k2","i","l1","l2"))
States <- unique(Actions[,list(k1,k2,i)])
因此,如果我使用的是 data.frame,则以下行将如下所示:
So if i was using data.frame, the following line would be like:
index <- apply(Actions,1,function(x) {which((States[,1]==x[4]) & (States[,2]==x[5]))})
我怎样才能有效地对 data.table 做同样的事情?
How can I do the same with data.table efficiently ?
推荐答案
一旦你掌握了 keys
和可能在 j<中使用的特殊符号,这相对简单
data.table
的/code> 表达式.试试这个...
This is relatively simple once you get the hang of keys
and the special symbols which may be used in the j
expression of a data.table
. Try this...
# First make an ID for each row for use in the `dcast`
# because you are going to have multiple rows with the
# same key values and you need to know where they came from
Actions[ , ID := 1:.N ]
# Set the keys to join on
setkeyv( Actions , c("l1" , "l2" ) )
setkeyv( States , c("k1" , "k2" ) )
# Join States to Actions, using '.I', which
# is the row locations in States in which the
# key of Actions are found and within each
# group the row number ( 1:.N - a repeating 1,2,3)
New <- States[ J(Actions) , list( ID , Ind = .I , Row = 1:.N ) ]
# k1 k2 ID Ind Row
#1: 700 100 1 1 1
#2: 700 100 1 2 2
#3: 700 100 1 3 3
#4: 700 100 2 1 1
#5: 700 100 2 2 2
#6: 700 100 2 3 3
# reshape using 'dcast.data.table'
dcast.data.table( Row ~ ID , data = New , value.var = "Ind" )
# Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27...
#1: 1 1 1 1 4 4 4 7 7 7 10 10 10 13 13 13 16 16 16 1 1 1 4 4 4 7 7 7...
#2: 2 2 2 2 5 5 5 8 8 8 11 11 11 14 14 14 17 17 17 2 2 2 5 5 5 8 8 8...
#3: 3 3 3 3 6 6 6 9 9 9 12 12 12 15 15 15 18 18 18 3 3 3 6 6 6 9 9 9...
这篇关于R - 两个 data.tables 的匹配值的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!