在data.table中按位置选择多个列 [英] Select multiple columns in data.table by location

查看:186
本文介绍了在data.table中按位置选择多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

相当于在 data.table 中选择多个列,就像 data.frame

What is the equivalent of selecting multiple columns in data.table just like this in data.frame?

df <- data.frame(a = 1, b = 2, c = 3)
df[, 2:3]
#   b c
# 1 2 3


推荐答案

(有关 data.table 中最近的更改的信息,可避免需要 with = FALSE (目前仅适用于开发版本),请参阅下面的 UPDATE 。)

(For info on recent changes in data.table that obviate the need for with=FALSE (currently only available in the development version), see the UPDATE below.)

表版本1.9.6及更早版本,只需设置 with = FALSE

library(data.table)
dt <- data.table(a=1:2, b=2:3, c=3:4)
dt[, 2:3, with = FALSE]
#    b c
# 1: 2 3
# 2: 3 4

据我所知,参数名为,带有,因为它决定列索引是否应该在数据表,因为它将使用例如基础R的 with() within()

As far as I can tell, the argument is named "with" because it determines whether the column index should be evaluated within the frame of the data.table, as it would be when using, e.g., base R's with() and within().

?data.table


默认情况下 with = TRUE j code> x 。

By default with=TRUE and j is evaluated within the frame of x. The column names can be used as variables.

with = FALSE j 是列名的字符向量,要选择的列位置的数字向量或 startcol:endcol 形式的字符向量,返回的值始终为 data.table ...

When with=FALSE j is a character vector of column names, a numeric vector of column positions to select or of the form startcol:endcol, and the value returned is always a data.table...

c $ c>?setkey

And there is some related thinking in ?setkey :


这不是好的编程习惯, / em>,使用列号而不是名称。 [...]如果你使用列号,那么随着时间的推移,错误(可能是沉默的)可以更容易地进入你的代码。例如,如果在几个月时间内添加,删除或重新排序列,则 setkey [或select]按列编号将引用不同的列,可能返回不正确结果没有警告。 (在SQL中存在一个类似的概念,当需要强大的可维护系统时,select * from ...被认为是编程风格差的。)如果你真的想使用列号,可能是故意的更难;例如 setkeyv(DT,colnames(DT)[1:2]) [或在选项中设置 with = FALSE ]。

It isn't good programming practice, in general, to use column numbers rather than names. [...] If you use column numbers then bugs (possibly silent) can more easily creep into your code as time progresses if changes are made elsewhere in your code; e.g., if you add, remove or reorder columns in a few months time, a setkey [or a select] by column number will then refer to a different column, possibly returning incorrect results with no warning. (A similar concept exists in SQL where "select * from ..." is considered poor programming style [by some] when a robust, maintainable system is required.) If you really wish to use column numbers, it's possible but deliberately a little harder; e.g., setkeyv(DT,colnames(DT)[1:2]) [or setting with=FALSE in selects].






更新:2016-10-18

data.table (v1.9.7)(安装说明)现在实现了更多的 data.table - 不一致的列选择语法。它将与未来稳定版本分发在CRAN,从v1.9.8开始。

The current development version of data.table (v1.9.7) (installation instructions here) now implements a more data.table-consistent column selection syntax. It will ship with future stable versions distributed on CRAN, starting with v1.9.8.

现在,没有明确设置 with = FALSE ,下面的任何调用都会像你希望的那样工作/期望他们:

Now, without explicitly setting with=FALSE, any of the following calls will work just as you'd hope/expect them to:

dt <- data.table(a = 1,b = 2,c = 3)
dt[, 2]
#    b
# 1: 2
dt[, 2:3]
#    b c
# 1: 2 3
dt[, "a"]
#    a
# 1: 1
dt[, c("a","b")]
#    a b
# 1: 1 2

相关的新闻条目说明了这一点以及另一个相关更改:

The relevant NEWS entry describes this and another related change:

当j不包含非引号变量名(无论是列名还是
不),with =现在自动设置为FALSE。因此,DT [,1],
DT [,someCol],DT [,c(colA,colB)]和DT [,100:109]都期望他们;即返回列#1188,#1149。由于
不是变量名,因此没有意图的含义。
DT [,colName1:colName2]不再需要with = FALSE,因为那是
也是明确的;它是一个单一的调用:函数所以with = TRUE
可能没有意义,尽管存在未引用的变量名。
这些更改可以进行,因为没有人可以使用现有的
行为返回字面值j值,因为永远不能
是有用的。这提供了一个新的能力,不应该打破任何
现有代码。选择单个列仍然返回1列
data.table(不是向量,默认情况下不像data.frame)类型
代码一致性(例如在DT [...] [.. 。] chain),可以
有时选择几列,有时候一个,因为一直是
在data.table的情况下,我们没有意图带回。在
future中,DT [,myCols](即单个变量名)将在调用范围中查找myCols
,而不需要设置= FALSE,就像
单个符号出现在我已经。新的行为可以通过设置选项
选项(datatable.WhenJisSymbolThenCallingScope = TRUE)打开
。默认是
目前为FALSE,给你时间来改变你的代码。在这个未来
状态下,单向(即DT [,theColName])选择列作为向量
而不是1列数据表将不再工作,留下两个
其他方法总是工作剩余(因为data.table是
仍然只是一个列表毕竟):DT [[someCol]]和DT $ someCol。那些
base R方法也更快(当迭代很多次)通过避免
小参数检查开销在更灵活的DT [...]
语法中已经在例子中突出显示数据表)多年。
在下一个版本中,DT [,someCol]将继续使用旧的当前
行为,但如果未设置新选项,则会开始警告。然后
默认将更改为TRUE,以推进您向前移动,而仍然
保留一种方式为您恢复此功能的旧行为
,同时仍允许您从其他新的特性
的最新版本没有改变你的代码。然后最终在
估计从现在起的2年后,该选项将被删除。

When j contains no unquoted variable names (whether column names or not), with= is now automatically set to FALSE. Thus, DT[,1], DT[,"someCol"], DT[,c("colA","colB")] and DT[,100:109] now work as we all expect them to; i.e., returning columns, #1188, #1149. Since there are no variable names there is no ambiguity as to what was intended. DT[,colName1:colName2] no longer needs with=FALSE either since that is also unambiguous; it's a single call to the : function so with=TRUE could make no sense, despite the presence of unquoted variable names. These changes can be made since nobody can be using the existing behaviour of returning back the literal j value since that can never be useful. This provides a new ability and should not break any existing code. Selecting a single column still returns a 1-column data.table (not a vector, unlike data.frame by default) for type consistency for code (e.g. within DT[...][...] chains) that can sometimes select several columns and sometime one, as has always been the case in data.table and we have no intention to bring back drop. In future, DT[,myCols] (i.e. a single variable name) will look for myCols in calling scope without needing to set with=FALSE too, just as a single symbol appearing in i does already. The new behaviour can be turned on now by setting the option: options(datatable.WhenJisSymbolThenCallingScope=TRUE). The default is currently FALSE to give you time to change your code. In this future state, one way (i.e. DT[,theColName]) to select the column as a vector rather than a 1-column data.table will no longer work leaving the two other ways that have always worked remaining (since data.table is still just a list after all): DT[["someCol"]] and DT$someCol. Those base R methods are faster too (when iterated many times) by avoiding the small argument checking overhead inside the more flexible DT[...] syntax as has been highlighted in example(data.table) for many years. In the next release, DT[,someCol] will continue with old current behaviour but start to warn if the new option is not set. Then the default will change to TRUE to nudge you to move forward whilst still retaining a way for you to restore old behaviour for this feature only, whilst still allowing you to benefit from other new features of the latest release without changing your code. Then finally after an estimated 2 years from now, the option will be removed.

这篇关于在data.table中按位置选择多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆