使用 .I 返回带有 data.table 包的行号 [英] Using .I to return row numbers with data.table package

查看:15
本文介绍了使用 .I 返回带有 data.table 包的行号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以向我解释一下 .I 用于返回 data.table 的行号的正确用法吗?

Would someone please explain to me the correct usage of .I for returning the row numbers of a data.table?

我有这样的数据:

require(data.table)
DT <- data.table(X=c(5, 15, 20, 25, 30))
DT
#     X
# 1:  5
# 2: 15
# 3: 20
# 4: 25
# 5: 30

我想返回一个行索引向量,其中 i 中的条件为 TRUE,例如哪些行的 X 大于 20.

I want to return a vector of row indices where a condition in i is TRUE, e.g. which rows have an X greater than 20.

DT[X > 20]
# rows 4 & 5 are greater than 20

为了获取索引,我尝试了:

To get the indices, I tried:

DT[X > 20, .I]
# [1] 1 2 

...但显然我做错了,因为这只是返回一个包含 1 到返回行数的向量.(我认为这几乎就是 .N 的用途?).

...but clearly I am doing it wrong, because that simply returns a vector containing 1 to the number of returned rows. (Which I thought was pretty much what .N was for?).

对不起,如果这看起来非常基本,但我在 data.table 文档中所能找到的只是 .I.N 做什么,而不是如何使用它们.

Sorry if this seems extremely basic, but all I have been able to find in the data.table documentation is WHAT .I and .N do, not HOW to use them.

推荐答案

如果你想要的只是行号而不是行本身,那么使用 which = TRUE, not .I.

If all you want is the row numbers rather than the rows themselves, then use which = TRUE, not .I.

DT[X > 20, which = TRUE]
# [1] 4 5

这样您就可以获得优化i 的好处,例如快速连接或使用自动索引.which = TRUE 使其仅以行号提前返回.

That way you get the benefits of optimization of i, for example fast joins or using an automatic index. The which = TRUE makes it return early with just the row numbers.

这是 data.table 中 which 参数的手动输入:

Here's the manual entry for the which argument inside data.table :

TRUE 返回 i 匹配的 x 的行号.如果 NA,则返回x 中不匹配的 i 行号.默认情况下 FALSE 和返回 x 中匹配的行.

TRUE returns the row numbers of x that i matches to. If NA, returns the row numbers of i that have no match in x. By default FALSE and the rows in x that match are returned.

<小时>

说明:

注意 .IDT[i = .., j = .. 中的 i = .. 参数之间存在特定关系,由= ..]即,.I 是子集表的行号向量.


Explanation:

Notice there is a specific relationship between .I and the i = .. argument in DT[i = .., j = .., by = ..] Namely, .I is a vector of row numbers of the subsetted table.

### Lets create some sample data
set.seed(1)
LL <- sample(LETTERS[1:5], 20, TRUE)
DT <- data.table(X=LL)

看看对整个表进行子集化和仅对.I

进行子集化的区别

look at the difference between subsetting the whole table, and subsetting just .I

DT[X == "B", .I]
# [1] 1 2 3 4 5 6

DT[  , .I[X == "B"] ]
# [1]  1  2  5 11 14 19

这篇关于使用 .I 返回带有 data.table 包的行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆