在其他数据表中检查data.table值的频率 [英] Check frequency of data.table value in other data.table
问题描述
library(data.table)
DT1 < - data.table(num = 1:6,group = c(A,B,B b,A,C))
DT2
我想向添加
当
至少两次。因此,在上面的示例中, DT2 $ group
包含在中时, $ DT2
DT1 $ group DT2
应为
b $ b 1:A TRUE
2:B TRUE
3:C FALSE
<
DT2 $ c $ c>实际上可能包含比 DT1
更多的组,因此这里有一个更新的示例: DT1 < - data.table(num = 1:6,group = c(A,B,B,B,A,C))
DT2 < - data.table(group = c(A,B,C,D))
所需输出为
group popular
1:A TRUE
2:B TRUE
3:C FALSE
4:D FALSE
解决方案我只是这样做:
## 1.9.4+
setkey(DT1,group)
DT1 [J(DT2 $ group),list(popular = .N> = 2L),by = .EACHI]
#group popular
#1:A
#2:B TRUE
#3:C FALSE
#4:D FALSE ##更新示例中的
data.table
的连接语法非常强大,还会聚合/选择/更新 j
中的列。这里我们执行一个join。对于 DT2 $ group
中的每一行,在 DT1
中的相应匹配行上,我们计算j-表达式 .N> = 2L
;指定 by = .EACHI
(请检查 1.9.4 NEWS ),我们每次计算j表达式。
在 1.9.4
,。()
已在所有i,j和by中引入为别名。所以你也可以做:
DT1 [。(DT2 $ group),。(popular = .N> = 2L) ,by = .EACHI]
当您通过单个字符列加入时, 。()
/ J()
语法。因此,这也可以写成:
DT1 [DT2 $ group,。(popular = .N> = 2L) by = .EACHI]
library(data.table)
DT1 <- data.table(num = 1:6, group = c("A", "B", "B", "B", "A", "C"))
DT2 <- data.table(group = c("A", "B", "C"))
I want to add a column popular
to DT2
with value TRUE
whenever DT2$group
is contained in DT1$group
at least twice. So, in the example above, DT2
should be
group popular
1: A TRUE
2: B TRUE
3: C FALSE
What would be an efficient way to get to this?
Updated example: DT2
may actually contain more groups than DT1
, so here's an updated example:
DT1 <- data.table(num = 1:6, group = c("A", "B", "B", "B", "A", "C"))
DT2 <- data.table(group = c("A", "B", "C", "D"))
And the desired output would be
group popular
1: A TRUE
2: B TRUE
3: C FALSE
4: D FALSE
解决方案 I'd just do it this way:
## 1.9.4+
setkey(DT1, group)
DT1[J(DT2$group), list(popular = .N >= 2L), by=.EACHI]
# group popular
# 1: A TRUE
# 2: B TRUE
# 3: C FALSE
# 4: D FALSE ## on the updated example
data.table
's join syntax is quite powerful, in that, while joining, you can also aggregate / select / update columns in j
. Here we perform a join. For each row in DT2$group
, on the corresponding matching rows in DT1
, we compute the j-expression .N >= 2L
; by specifying by=.EACHI
(please check 1.9.4 NEWS), we compute the j-expression each time.
In 1.9.4
, .()
has been introduced as an alias in all i, j and by. So you could also do:
DT1[.(DT2$group), .(popular = .N >= 2L), by=.EACHI]
When you're joining by a single character column, you can drop the .()
/ J()
syntax altogether (for convenience). So this can be also written as:
DT1[DT2$group, .(popular = .N >= 2L), by=.EACHI]
这篇关于在其他数据表中检查data.table值的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!