使用 .I 返回带有 data.table 包的行号 [英] Using .I to return row numbers with data.table package
问题描述
有人可以向我解释一下 .I
用于返回 data.table 的行号的正确用法吗?
Would someone please explain to me the correct usage of .I
for returning the row numbers of a data.table?
我有这样的数据:
require(data.table)
DT <- data.table(X=c(5, 15, 20, 25, 30))
DT
# X
# 1: 5
# 2: 15
# 3: 20
# 4: 25
# 5: 30
我想返回一个行索引向量,其中 i
中的条件为 TRUE
,例如哪些行的 X
大于 20.
I want to return a vector of row indices where a condition in i
is TRUE
, e.g. which rows have an X
greater than 20.
DT[X > 20]
# rows 4 & 5 are greater than 20
为了获取索引,我尝试了:
To get the indices, I tried:
DT[X > 20, .I]
# [1] 1 2
...但显然我做错了,因为这只是返回一个包含 1 到返回行数的向量.(我认为这几乎就是 .N
的用途?).
...but clearly I am doing it wrong, because that simply returns a vector containing 1 to the number of returned rows. (Which I thought was pretty much what .N
was for?).
对不起,如果这看起来非常基本,但我在 data.table 文档中所能找到的只是 .I
和 .N
做什么,而不是如何使用它们.
Sorry if this seems extremely basic, but all I have been able to find in the data.table documentation is WHAT .I
and .N
do, not HOW to use them.
推荐答案
如果你想要的只是行号而不是行本身,那么使用 which = TRUE
, not .I
.
If all you want is the row numbers rather than the rows themselves, then use which = TRUE
, not .I
.
DT[X > 20, which = TRUE]
# [1] 4 5
这样您就可以获得优化i
的好处,例如快速连接或使用自动索引.which = TRUE
使其仅以行号提前返回.
That way you get the benefits of optimization of i
, for example fast joins or using an automatic index. The which = TRUE
makes it return early with just the row numbers.
这是 data.table 中 which
参数的手动输入:
Here's the manual entry for the which
argument inside data.table :
TRUE
返回 i
匹配的 x
的行号.如果 NA
,则返回x
中不匹配的 i
行号.默认情况下 FALSE
和返回 x
中匹配的行.
TRUE
returns the row numbers ofx
thati
matches to. IfNA
, returns the row numbers ofi
that have no match inx
. By defaultFALSE
and the rows inx
that match are returned.
<小时>
说明:
注意 .I
和 DT[i = .., j = .. 中的
即,i = ..
参数之间存在特定关系,由= ..].I
是子集表的行号向量.
Explanation:
Notice there is a specific relationship between .I
and the i = ..
argument in DT[i = .., j = .., by = ..]
Namely, .I
is a vector of row numbers of the subsetted table.
### Lets create some sample data
set.seed(1)
LL <- sample(LETTERS[1:5], 20, TRUE)
DT <- data.table(X=LL)
看看对整个表进行子集化和仅对.I
进行子集化的区别
look at the difference between subsetting the whole table, and subsetting just .I
DT[X == "B", .I]
# [1] 1 2 3 4 5 6
DT[ , .I[X == "B"] ]
# [1] 1 2 5 11 14 19
这篇关于使用 .I 返回带有 data.table 包的行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!