data.table中的`by`和`.EACHI` [英] `by` and `.EACHI` in data.table

查看:61
本文介绍了data.table中的`by`和`.EACHI`的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

a = data.table(id = c(1L, 1L, 2L, 3L, NA_integer_), t = c(1L, 2L, 1L, 2L, NA_integer_), x = 11:15)
b = data.table(id = 1:2, y = c(11L, 15L))

# > a
# id  t  x
# 1:  1  1 11
# 2:  1  2 12
# 3:  2  1 13
# 4:  3  2 14
# 5: NA NA 15

# > b
# id  y
# 1:  1 11
# 2:  2 15

a[b, on=.(id), sum(x), by = .(id)]
# > a[b, on=.(id), sum(x), by = .(id)]
# id V1
# 1:  1 23
# 2:  1 13

为什么上面的查询在第二行中不返回id = 2,V1 = 13?我可以使用 by = .EACHI 达到我的期望:

Why does the above query not return id = 2, V1 = 13 in the second row? I get what I would expect using by=.EACHI though:

a[b, on=.(id), sum(x), by = .EACHI]
# > a[b, on=.(id), sum(x), by = .EACHI]
# id V1
# 1:  1 23
# 2:  2 13


推荐答案

似乎在两个data.tables之间进行正确的联接时,我们应该在联接的 by 参数中使用 by = .EACHI ,而不使用右表中的任何变量( b ),因为在生成的联接表中将无法访问它们。这就是为什么第一个查询中的 by = .id 不起作用的原因。

It seems that when doing a right join between two data.tables, we should use by=.EACHI in the by parameter of the join, and not use any variables from the right table (b here), as they won't be accessible in the resulting joined table. Thats why by = .id in the first query doesn't work.

如第3.5.3节所述此处 http://franknarf1.github.io/r-tutorial/_book/tables .html

As noted in section 3.5.3 here http://franknarf1.github.io/r-tutorial/_book/tables.html


请注意DT [i,on =,j,by = bycols]。只是重复一遍:只有by = .EACHI在
的联接中起作用。键入其他by =值将导致我的列变为
不可用

Beware DT[i,on=,j,by=bycols]. Just to repeat: only by=.EACHI works in a join. Typing other by= values there will cause i’s columns to become unavailable

此查询帮助我稍微理解了上面的语句更好:

This query helped me understand the above statement a little better:

a[b, .SD, on = .(id)]
# id t  x
# 1:  1 1 11
# 2:  1 2 12
# 3:  2 1 13

b 中的列,除了 id ,在中不可访问.SD

The columns from b, besides id, are not accessible in .SD for this join.

我想这意味着在如上所述的联接中, by 必须采用。 EACHI 或左表中的列名(此处为 a ),不是联接变量名(如上面的问题所示, id 无法正常工作,即使它也位于 a 中)。因为使用 a 中的列名似乎可以正常工作:

I guess that means in a join like the above, by must take either .EACHI, or a column name from the left table (a here) that is not the join variable name (as in the question above shows, id doesn't work right, even though it is in a too). Because using a column name from a seems to work correctly:

a[b, sum(x), on = .(id), by = .(t)]
   t V1
1: 1 24
2: 2 12

这篇关于data.table中的`by`和`.EACHI`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆