如何在data.table中按名称删除列? [英] How do you delete a column by name in data.table?
问题描述
要移除 data.frame
中名为foo的列,我可以这样做:
df <-df [-grep('foo',colnames(df))]
一旦 df
被转换为 data.table
对象,则无法只删除列。
示例:
df< - data.frame(id = 1: 100,foo = rnorm(100))
df2 < - df [-grep('foo',colnames(df))]#works
df3< - data.table(df)
df3 [-grep('foo',colnames(df3))]
转换为 data.table
对象,这不再工作。
以下任何操作都会从数据中删除列 foo
表 df3
:
#方法1即使在20GB的data.table)
df3 [,foo:= NULL]
df3 [,c(foo,bar):= NULL]#删除两列
myVar =foo
df3 [,(myVar):= NULL]#lookup myVar contents
#方法2a - 可能是多个)
#列匹配regex
df3 [,grep(^ foo $,colnames(df3)):= NULL]
#方法2b - 替代2a,在下面的意义上也是安全的
df3 [,which(grepl(^ foo $,colnames(df3))):= NULL]
data.table 也支持以下语法:
##方法3(然后可以分配给df3,
df3 [,!foo,with = FALSE]
虽然如果你实际上想从
df3 删除
foo code>(与只打印
df3
减列foo
的视图相反)
(请注意,如果你使用的方法依赖于
grep()
或grepl()
,您需要设置pattern =^ foo $
foo和buffoon$ ...的名称的列的foo
c $ c>(ie包含foo
作为子字符串的那些)也可以匹配和删除。)
use:
接下来的两个成语也会起作用 - 如果
df3
c $ c>foo - 但如果没有,可能会以意外的方式失败。例如,如果你使用它们中的任何一个来搜索不存在的列bar
,你将得到一个零行data.table。 / p>
因此,它们真的最适合于交互式使用,例如,希望显示一个data.table减去任何包含子字符
foo
。对于编程目的(或者如果你想从df3
而不是从它的副本中实际删除列),方法1,2a和2b真的最佳选项。#方法4a:
df3 [,-grep(^ foo $,colnames df3)),with = FALSE]
#方法4b:
df3 [,!grepl(^ foo $,colnames(df3)),with = FALSE]
To get rid of a column named "foo" in a
data.frame
, I can do:
df <- df[-grep('foo', colnames(df))]
However, once
df
is converted to adata.table
object, there is no way to just remove a column.Example:
df <- data.frame(id = 1:100, foo = rnorm(100)) df2 <- df[-grep('foo', colnames(df))] # works df3 <- data.table(df) df3[-grep('foo', colnames(df3))]
But once it is converted to a
data.table
object, this no longer works.解决方案Any of the following will remove column
foo
from the data.tabledf3
:# Method 1 (and preferred as it takes 0.00s even on a 20GB data.table) df3[,foo:=NULL] df3[, c("foo","bar"):=NULL] # remove two columns myVar = "foo" df3[, (myVar):=NULL] # lookup myVar contents # Method 2a -- A safe idiom for excluding (possibly multiple) # columns matching a regex df3[, grep("^foo$", colnames(df3)):=NULL] # Method 2b -- An alternative to 2a, also "safe" in the sense described below df3[, which(grepl("^foo$", colnames(df3))):=NULL]
data.table also supports the following syntax:
## Method 3 (could then assign to df3, df3[, !"foo", with=FALSE]
though if you were actually wanting to remove column
"foo"
fromdf3
(as opposed to just printing a view ofdf3
minus column"foo"
) you'd really want to use Method 1 instead.(Do note that if you use a method relying on
grep()
orgrepl()
, you need to setpattern="^foo$"
rather than"foo"
, if you don't want columns with names like"fool"
and"buffoon"
(i.e. those containingfoo
as a substring) to also be matched and removed.)Less safe options, fine for interactive use:
The next two idioms will also work -- if
df3
contains a column matching"foo"
-- but will fail in a probably-unexpected way if it does not. If, for instance, you use any of them to search for the non-existent column"bar"
, you'll end up with a zero-row data.table.As a consequence, they are really best suited for interactive use where one might, e.g., want to display a data.table minus any columns with names containing the substring
"foo"
. For programming purposes (or if you are wanting to actually remove the column(s) fromdf3
rather than from a copy of it), Methods 1, 2a, and 2b are really the best options.# Method 4a: df3[, -grep("^foo$", colnames(df3)), with=FALSE] # Method 4b: df3[, !grepl("^foo$", colnames(df3)), with=FALSE]
这篇关于如何在data.table中按名称删除列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!