使用.I返回带有data.table包的行号 [英] Using .I to return row numbers with data.table package

查看:61
本文介绍了使用.I返回带有data.table包的行号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请问有人给我解释 .I 的正确用法,用于返回数据表的行号吗?

Would someone please explain to me the correct usage of .I for returning the row numbers of a data.table?

我有这样的数据:

require(data.table)
DT <- data.table(X=c(5, 15, 20, 25, 30))
DT
#     X
# 1:  5
# 2: 15
# 3: 20
# 4: 25
# 5: 30

我想返回行索引的向量,其中 i 中的条件为 TRUE ,例如哪些行的 X 大于20。

I want to return a vector of row indices where a condition in i is TRUE, e.g. which rows have an X greater than 20.

DT[X > 20]
# rows 4 & 5 are greater than 20

要获取索引,我尝试过:

To get the indices, I tried:

DT[X > 20, .I]
# [1] 1 2 

...但显然我这样做是错误的,因为这只会返回一个包含1的向量(返回的行数)。 (我以为 .N 到底是干什么的?)。

...but clearly I am doing it wrong, because that simply returns a vector containing 1 to the number of returned rows. (Which I thought was pretty much what .N was for?).

对不起,如果这看起来非常基础,但是我只能在data.table文档中找到 .I .N 的含义,而不是如何使用它们。

Sorry if this seems extremely basic, but all I have been able to find in the data.table documentation is WHAT .I and .N do, not HOW to use them.

推荐答案

如果只需要行号而不是行本身,则使用哪个= TRUE 不是 。我

If all you want is the row numbers rather than the rows themselves, then use which = TRUE, not .I.

DT[X > 20, which = TRUE]
# [1] 4 5

优化 i 的好处,例如快速联接或使用自动索引。 哪个=真使它只返回行号就早返回。

That way you get the benefits of optimization of i, for example fast joins or using an automatic index. The which = TRUE makes it return early with just the row numbers.

这是<$的手动输入c $ c> which data.table中的参数:

Here's the manual entry for the which argument inside data.table :


TRUE 返回与 i 匹配的 x 行号。如果 NA ,返回
i 的行号在中不匹配x 。默认情况下,返回 FALSE 并返回 x 中的
行。

TRUE returns the row numbers of x that i matches to. If NA, returns the row numbers of i that have no match in x. By default FALSE and the rows in x that match are returned.






解释:



注意存在特定关系在 .I DT [i = .., j = ..,by = ..]
即, .I 是子表的行号的向量。


Explanation:

Notice there is a specific relationship between .I and the i = .. argument in DT[i = .., j = .., by = ..] Namely, .I is a vector of row numbers of the subsetted table.

### Lets create some sample data
set.seed(1)
LL <- sample(LETTERS[1:5], 20, TRUE)
DT <- data.table(X=LL)



看一下将整个表设置为子表与只设置的子集之间的区别。I



look at the difference between subsetting the whole table, and subsetting just .I

DT[X == "B", .I]
# [1] 1 2 3 4 5 6

DT[  , .I[X == "B"] ]
# [1]  1  2  5 11 14 19

这篇关于使用.I返回带有data.table包的行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆