.EACHI在data.table中? [英] .EACHI in data.table?
问题描述
我似乎无法找到有关 .EACHI
在 data.table
中具体含义的文档。我在文档中看到了一个简短的提及:
当通过这些组时,已知组的子集的聚合特别有效
在i中设置组,并通过设置= .EACHI
。当i
是
data.table时,DT [i,j,by = .EACHI]
会评估j
用于DT
组中的
i
加入。我们把这个分组称为每个我。
但是在 DT
是什么意思?是由在
DT
上设置的键确定的组?组中的每一行都使用所有列作为关键字?我完全理解如何运行诸如 DT [i,j,by = my_grouping_variable]
之类的东西,但却对 .EACHI
会工作。有人可以解释一下吗?
我已将此添加到列表这里。希望我们能够按计划交付。
原因很可能是 by = .EACHI
是最近的一个功能(自1.9.4开始),但它所做的不是。让我用一个例子来解释。假设我们有两个data.tables X
和 Y
:
X = data.table(x = c(1,1,1,2,2,5,6),y = 1:7,key =x)
Y = data.table(x = c(2,6),z = letters [2:1],key =x)
我们知道我们可以通过做 X [Y]
来加入。这与子集操作类似,但使用 data.tables
(而不是整数/行名或逻辑值)。对于 Y
中的每一行,取 Y
的键列,它会查找并返回 X
的键列(在 Y
中有+列)。
X [Y]
#xyz
#1:2 4 b
#2:2 5 b
#3:6 7 a
现在让我们假设,对于 Y
的键列(这里只有一个键列),我们希望得到 X
中匹配的 count 。在 data.table
< 1.9.4 ,我们可以通过简单地在 j
中指定 .N
来实现,如下所示: p>
#< 1.9.4
X [Y,.N]
#x N
#1:2 2
#2:6 1
这个隐含地所做的是,在 j
对 X
的每个匹配结果评估 j表达式
(对应于 Y中的行
)。这被称为 by-without-by 或隐式,因为它好像有一个隐藏的。
问题在于,这总是会通过操作执行 另外 因此添加了 它完成了它的目的(避免混淆)。它返回连接产生的行数。 而且, 评估 如果我们为每个运行 所以我们现在有两个功能。希望这有助于。 I cannot seem to find any documentation on what exactly Aggregation for a subset of known groups is particularly efficient
when passing those groups in i and setting But what does "groups" in the context of I've added this to the list here. And hopefully we'll be able to deliver as planned. The reason is most likely that We know that we can join by doing Now let's say we'd like to, for each row from What this implicitly does is, in the presence of The issue was that this'll always perform a Additionally Hence it does what it's meant to do (avoids confusion). It returns the number of rows resulting from the join. And, evaluates If we run So we now have both functionalities. Hope this helps. 这篇关于.EACHI在data.table中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!操作。因此,如果我们想知道连接后的行数,那么我们必须这样做:
X [Y] [.N]
(或简单地 nrow(X [Y])
在这种情况下)。也就是说,如果我们不想要 by-without-by-by $ c,我们不能在同一个调用中拥有
j
表达式$ C>。因此,当我们例如 X [Y,list(z)]
时,它评估 list(z)
使用 by-without-by
,因此稍微慢一些。
数据。表
用户要求这是显式 - 请参阅 和此以了解更多上下文。
by = .EACHI
。现在,当我们做:
X [Y,.N]
#[1] 3
X [Y,.N,by = .EACHI]
Y
(对应于 Y
>这里的关键列)。通过使用 which = TRUE
可以更容易地看到它。
X [。(2),which = TRUE]#[1] 4 5
X [。(6),which = TRUE]#[1] 7
.N
,那么我们应该得到2,1。
X [Y,.N,by = .EACHI]
#x N
#1:2 2
#2:6 1
.EACHI
does in data.table
. I see a brief mention of it in the documentation:
by=.EACHI
. When i
is a
data.table, DT[i,j,by=.EACHI]
evaluates j
for the groups of DT
that
each row in i
joins to. We call this grouping by each i.DT
mean? Is a group determined by the key that is set on DT
? Is the group every distinct row that uses all the columns as the key? I fully understand how to run something like DT[i,j,by=my_grouping_variable]
but am confused as to how .EACHI
would work. Could someone explain please?
by=.EACHI
is a recent feature (since 1.9.4), but what it does isn't. Let me explain with an example. Suppose we have two data.tables X
and Y
: X = data.table(x = c(1,1,1,2,2,5,6), y = 1:7, key = "x")
Y = data.table(x = c(2,6), z = letters[2:1], key = "x")
X[Y]
. this is similar to a subset operation, but using data.tables
(instead of integers / row names or logical values). For each row in Y
, taking Y
's key columns, it finds and returns corresponding matching rows in X
's key columns (+ columns in Y
) .X[Y]
# x y z
# 1: 2 4 b
# 2: 2 5 b
# 3: 6 7 a
Y
's key columns (here only one key column), we'd like to get the count of matches in X
. In versions of data.table
< 1.9.4, we can do this by simply specifying .N
in j
as follows:# < 1.9.4
X[Y, .N]
# x N
# 1: 2 2
# 2: 6 1
j
, evaluate the j-expression
on each matched result of X
(corresponding to the row in Y
). This was called by-without-by or implicit-by, because it's as if there's a hidden by. by
operation. So, if we wanted to know the number of rows after a join, then we'd have to do: X[Y][ .N]
(or simply nrow(X[Y])
in this case). That is, we can't have the j
expression in the same call if we don't want a by-without-by
. As a result, when we did for example X[Y, list(z)]
, it evaluated list(z)
using by-without-by
and was therefore slightly slower.data.table
users requested this to be explicit - see this and this for more context.by=.EACHI
was added. Now, when we do:X[Y, .N]
# [1] 3
X[Y, .N, by=.EACHI]
j
-expression on the matching rows for each row in Y
(corresponding to value from Y
's key columns here). It'd be easier to see this by using which=TRUE
.X[.(2), which=TRUE] # [1] 4 5
X[.(6), which=TRUE] # [1] 7
.N
for each, then we should get 2,1.X[Y, .N, by=.EACHI]
# x N
# 1: 2 2
# 2: 6 1