.SD 在 R 中的 data.table 中代表什么 [英] What does .SD stand for in data.table in R
问题描述
.SD
看起来很有用,但我真的不知道我在用它做什么.它代表什么?为什么会有前一段(句号).当我使用它时会发生什么?
.SD
looks useful but I do not really know what I am doing with it. What does it stand for? Why is there a preceding period (full stop). What is happening when I use it?
我读到:.SD
是一个 data.table
,其中包含每个组的 x
数据的子集,不包括组列.它可以在按i
分组时使用,按by
分组时使用,keyed by
和_ad hoc_ by
I read:
.SD
is a data.table
containing the subset of x
's data for each group, excluding the group column(s). It can be used when grouping by i
, when grouping by by
, keyed by
, and _ad hoc_ by
这是否意味着子 data.table
s 被保存在内存中以供下一次操作使用?
Does that mean that the daughter data.table
s is held in memory for the next operation?
推荐答案
.SD
代表类似S
ubset of D
ata.桌子".初始的 "."
没有任何意义,只是它更不可能与用户定义的列名发生冲突.
.SD
stands for something like "S
ubset of D
ata.table". There's no significance to the initial "."
, except that it makes it even more unlikely that there will be a clash with a user-defined column name.
如果这是你的 data.table:
If this is your data.table:
DT = data.table(x=rep(c("a","b","c"),each=2), y=c(1,3), v=1:6)
setkey(DT, y)
DT
# x y v
# 1: a 1 1
# 2: b 1 3
# 3: c 1 5
# 4: a 3 2
# 5: b 3 4
# 6: c 3 6
这样做可能会帮助您了解 .SD
是什么:
Doing this may help you see what .SD
is:
DT[ , .SD[ , paste(x, v, sep="", collapse="_")], by=y]
# y V1
# 1: 1 a1_b3_c5
# 2: 3 a2_b4_c6
基本上,by=y
语句将原始 data.table 分解为这两个子 data.tables
Basically, the by=y
statement breaks the original data.table into these two sub-data.tables
DT[ , print(.SD), by=y]
# <1st sub-data.table, called '.SD' while it's being operated on>
# x v
# 1: a 1
# 2: b 3
# 3: c 5
# <2nd sub-data.table, ALSO called '.SD' while it's being operated on>
# x v
# 1: a 2
# 2: b 4
# 3: c 6
# <final output, since print() doesn't return anything>
# Empty data.table (0 rows) of 1 col: y
并依次对它们进行操作.
and operates on them in turn.
当它在任何一个上运行时,它允许您通过使用昵称/句柄/符号 .SD
来引用当前的子data.table
.这非常方便,因为您可以访问和操作列,就像您坐在命令行上使用名为 .SD
的单个 data.table 一样......除了这里,data.table
将对由键组合定义的每个子 data.table
执行这些操作,将它们粘贴"回一起并在单个 中返回结果数据表
!
While it is operating on either one, it lets you refer to the current sub-data.table
by using the nick-name/handle/symbol .SD
. That's very handy, as you can access and operate on the columns just as if you were sitting at the command line working with a single data.table called .SD
... except that here, data.table
will carry out those operations on every single sub-data.table
defined by combinations of the key, "pasting" them back together and returning the results in a single data.table
!
这篇关于.SD 在 R 中的 data.table 中代表什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!