在其他数据表中检查data.table值的频率 [英] Check frequency of data.table value in other data.table

查看:137
本文介绍了在其他数据表中检查data.table值的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  library(data.table)
DT1 < - data.table(num = 1:6,group = c(A,B,B b,A,C))
DT2

我想向添加 DT2 $ group 包含在中时, $ DT2 DT1 $ group 至少两次。因此,在上面的示例中, DT2 应为

  b $ b 1:A TRUE 
2:B TRUE
3:C FALSE



<

更新示例: DT2 c>实际上可能包含比 DT1 更多的组,因此这里有一个更新的示例:

  DT1 < -  data.table(num = 1:6,group = c(A,B,B,B,A,C))
DT2 < - data.table(group = c(A,B,C,D))

所需输出为

  group popular 
1:A TRUE
2:B TRUE
3:C FALSE
4:D FALSE


解决方案

我只是这样做:

  ## 1.9.4+ 
setkey(DT1,group)
DT1 [J(DT2 $ group),list(popular = .N> = 2L),by = .EACHI]
#group popular
#1:A
#2:B TRUE
#3:C FALSE
#4:D FALSE ##更新示例中的

data.table 的连接语法非常强大,还会聚合/选择/更新 j 中的列。这里我们执行一个join。对于 DT2 $ group 中的每一行,在 DT1 中的相应匹配行上,我们计算j-表达式 .N> = 2L ;指定 by = .EACHI (请检查 1.9.4 NEWS ),我们每次计算j表达式。






1.9.4 。()已在所有i,j和by中引入为别名。所以你也可以做:

  DT1 [。(DT2 $ group),。(popular = .N> = 2L) ,by = .EACHI] 

当您通过单个字符列加入时, 。() / J()语法。因此,这也可以写成:

  DT1 [DT2 $ group,。(popular = .N> = 2L) by = .EACHI] 


 library(data.table)
 DT1 <- data.table(num = 1:6, group = c("A", "B", "B", "B", "A", "C"))
 DT2 <- data.table(group = c("A", "B", "C"))

I want to add a column popular to DT2 with value TRUE whenever DT2$group is contained in DT1$group at least twice. So, in the example above, DT2 should be

    group popular
 1:     A    TRUE
 2:     B    TRUE
 3:     C   FALSE

What would be an efficient way to get to this?

Updated example: DT2 may actually contain more groups than DT1, so here's an updated example:

 DT1 <- data.table(num = 1:6, group = c("A", "B", "B", "B", "A", "C"))
 DT2 <- data.table(group = c("A", "B", "C", "D"))

And the desired output would be

    group popular
 1:     A    TRUE
 2:     B    TRUE
 3:     C   FALSE
 4:     D   FALSE

解决方案

I'd just do it this way:

## 1.9.4+
setkey(DT1, group)
DT1[J(DT2$group), list(popular = .N >= 2L), by=.EACHI]
#    group popular
# 1:     A    TRUE
# 2:     B    TRUE
# 3:     C   FALSE
# 4:     D   FALSE ## on the updated example

data.table's join syntax is quite powerful, in that, while joining, you can also aggregate / select / update columns in j. Here we perform a join. For each row in DT2$group, on the corresponding matching rows in DT1, we compute the j-expression .N >= 2L; by specifying by=.EACHI (please check 1.9.4 NEWS), we compute the j-expression each time.


In 1.9.4, .() has been introduced as an alias in all i, j and by. So you could also do:

DT1[.(DT2$group), .(popular = .N >= 2L), by=.EACHI]

When you're joining by a single character column, you can drop the .() / J() syntax altogether (for convenience). So this can be also written as:

DT1[DT2$group, .(popular = .N >= 2L), by=.EACHI]

这篇关于在其他数据表中检查data.table值的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆