以不同方式用数字索引设置data.table列的结果不同 [英] Different results when subsetting data.table columns with numeric indices in different ways
问题描述
请参阅最小示例:
library(data.table)
DT <- data.table(x = 2, y = 3, z = 4)
DT[, c(1:2)] # first way
# x y
# 1: 2 3
DT[, (1:2)] # second way
# [1] 1 2
DT[, 1:2] # third way
# x y
# 1: 2 3
此发布,现在可以使用数字索引对列进行子设置。但是,我想知道为什么索引以第二种方式而不是列索引来计算为向量?
As described in this post, subsetting columns with numeric indices is possible now. However, I would like to known why indices are evaluated to a vector in the second way rather than column indices?
此外,我更新了 data.table
刚刚:
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.2
loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4 yaml_2.1.17
推荐答案
通过查看在源代码中,我们可以模拟不同输入的data.tables行为
By looking at the source code we can simulate data.tables behaviour for different inputs
if (!missing(j)) {
jsub = replace_dot_alias(substitute(j))
root = if (is.call(jsub)) as.character(jsub[[1L]])[1L] else ""
if (root == ":" ||
(root %chin% c("-","!") && is.call(jsub[[2L]]) && jsub[[2L]][[1L]]=="(" && is.call(jsub[[2L]][[2L]]) && jsub[[2L]][[2L]][[1L]]==":") ||
( (!length(av<-all.vars(jsub)) || all(substring(av,1L,2L)=="..")) &&
root %chin% c("","c","paste","paste0","-","!") &&
missing(by) )) { # test 763. TODO: likely that !missing(by) iff with==TRUE (so, with can be removed)
# When no variable names (i.e. symbols) occur in j, scope doesn't matter because there are no symbols to find.
# If variable names do occur, but they are all prefixed with .., then that means look up in calling scope.
# Automatically set with=FALSE in this case so that DT[,1], DT[,2:3], DT[,"someCol"] and DT[,c("colB","colD")]
# work as expected. As before, a vector will never be returned, but a single column data.table
# for type consistency with >1 cases. To return a single vector use DT[["someCol"]] or DT[[3]].
# The root==":" is to allow DT[,colC:colH] even though that contains two variable names.
# root == "-" or "!" is for tests 1504.11 and 1504.13 (a : with a ! or - modifier root)
# We don't want to evaluate j at all in making this decision because i) evaluating could itself
# increment some variable and not intended to be evaluated a 2nd time later on and ii) we don't
# want decisions like this to depend on the data or vector lengths since that can introduce
# inconistency reminiscent of drop=TRUE in [.data.frame that we seek to avoid.
with=FALSE
基本上, [。data.table
捕获传递给 j
的表达式,并根据一些预定义的规则来决定如何处理它。如果满足其中一个规则,它将设置 with = FALSE
,这基本上意味着列名已传递给 j
,
Basically, "[.data.table"
catches the expression passed to j
and decides how to treat it based on some predefined rules. If one of the rules is satisfied, it sets with=FALSE
which basically means that column names were passed to j
, using standard evaluation.
规则大致如下:
-
设置
with = FALSE
,
1.1。如果 j
表达式是一个调用并且该调用是:
或
1.1. if j
expression is a call and the call is :
or
1.2。如果呼叫是 c(-,!)
和(
和:
或
1.2. if the call is a combination of c("-","!")
and (
and :
or
1.3。如果某些值(字符,整数,数字等)或。
传递给 j
,调用位于 c(, c, paste, paste0 ,-,!)
,并且没有 by
呼叫
1.3. if some value (character, integer, numeric, etc.) or ..
was passed to j
and the call is in c("","c","paste","paste0","-","!")
and there is no a by
call
否则将 with = TRUE
设置为将其转换为函数,然后查看是否满足任何条件(我已跳过了将。
转换为 list
功能,因为它在这里无关紧要。我们将直接使用 list
进行测试)
So we can convert this into a function and see if any of the conditions were satisfied (I've skipped the converting the .
to list
function as it is irrelevant here. We will just test with list
directly)
is_satisfied <- function(...) {
jsub <- substitute(...)
root = if (is.call(jsub)) as.character(jsub[[1L]])[1L] else ""
if (root == ":" ||
(root %chin% c("-","!") &&
is.call(jsub[[2L]]) &&
jsub[[2L]][[1L]]=="(" &&
is.call(jsub[[2L]][[2L]]) &&
jsub[[2L]][[2L]][[1L]]==":") ||
( (!length(av<-all.vars(jsub)) || all(substring(av,1L,2L)=="..")) &&
root %chin% c("","c","paste","paste0","-","!"))) TRUE else FALSE
}
is_satisfied("x")
# [1] TRUE
is_satisfied(c("x", "y"))
# [1] TRUE
is_satisfied(..x)
# [1] TRUE
is_satisfied(1:2)
# [1] TRUE
is_satisfied(c(1:2))
# [1] TRUE
is_satisfied((1:2))
# [1] FALSE
is_satisfied(y)
# [1] FALSE
is_satisfied(list(x, y))
# [1] FALSE
这篇关于以不同方式用数字索引设置data.table列的结果不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!