在data.table的i中使用match [英] Use of match within i of data.table
问题描述
%in%
运算符是匹配函数返回与x相同长度的向量的包装器。例如:
The %in%
operator is a wrapper for the match function returning "a vector of the same length as x". For instance:
> match(c("a", "b", "c"), c("a", "a"), nomatch = 0) > 0
## [1] TRUE FALSE FALSE
> of data.table,但
When used within i
of data.table, however
(dt1 <- data.table(v1 = c("a", "b", "c"), v2 = "dt1"))
v1 v2
1: a dt1
2: b dt1
3: c dt1
(dt2 <- data.table(v1 = c("a", "a"), v2 = "dt2"))
v1 v2
1: a dt2
2: a dt2
dt1[v1 %in% dt2$v1]
v1 v2
1: a dt1
2: a dt1
个重复项。如果data.table的 i
中的%in%
的预期行为不会给出与
duplicates are obtained. Should the expected behaviour of %in%
within i
of data.table not give the same result as
dt1[dt1$v1 %in% dt2$v1]
v1 v2
1: a dt1
ie没有重复?
推荐答案
这是 data.table
V < 1.9.5自动索引在V> = 1.9.5中固定。
This was a bug in data.table
V < 1.9.5 automatic indexing that was fixed in V >= 1.9.5.
我可以想到3种可能的解决方法:
I can think of 3 possible workarounds:
-
停用自动索引功能,并使用中的
%in
Disable the auto indexing and use base R
%in%
as in
options(datatable.auto.index = FALSE)
dt1[v1 %in% dt2$v1]
## v1 v2
## 1: a dt1
使用内置的%chin %
运算符,更高效,没有此错误(仅适用于字符向量比较)
Use the built in %chin%
operator which both more efficient and doesn't have this bug (works only on character vectors comparison)
dt1[v1 %chin% dt2$v1]
## v1 v2
## 1: a dt1
li>
从Github安装开发版本(先关闭所有R会话,然后重新打开一个)
Install the development version from Github (Close all your R sessions first and reopen just one)
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
library(data.table)
dt1 <- data.table(v1 = c("a", "b", "c"), v2 = "dt1")
dt2 <- data.table(v1 = c("a", "a"), v2 = "dt2")
dt1[v1 %in% dt2$v1]
## v1 v2
## 1: a dt1
这篇关于在data.table的i中使用match的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!