根据正则表达式选择data.table的列 [英] Select columns of data.table based on regex

查看：100 发布时间：2020/10/15 19:15:02 regex r data.table

本文介绍了根据正则表达式选择data.table的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何基于正则表达式选择data.table的列？
考虑一个简单的示例，如下所示：

How can I select columns of a data.table based on a regex? Consider a simple example as follows:

library(data.table)
mydt <- data.table(foo=c(1,2), bar=c(2,3), baz=c(3,4))

是否可以使用 bar 和<$ c列来自数据表的$ c> baz 基于正则表达式？我知道以下解决方案有效，但是如果表更大，并且我想选择更多变量，则很容易变得麻烦。

Is there a way to use columns of bar and baz from the datatable based on a regex? I know that the following solution works but if the table is much bigger and I would like to choose more variables this could easily get cumbersome.

mydt[, .(bar, baz)]

我会希望在 dplyr :: select（）中有类似 matches（）的东西，但仅供参考。

I would like to have something like matches() in dplyr::select() but only by reference.

推荐答案

更新：我使用@sindri_baldur的答案更新了比较-使用版本 1.12.6 。根据结果， patterns（）是一个方便的快捷方式，但是如果性能很重要，则应该坚持使用 .. 或 with = FALSE 解决方案（如下所示）。

UPDATE: I updated the comparison with @sindri_baldur's answer - using version 1.12.6. According to the results, patterns() is a handy shortcut, but if performance matters, one should stick with the .. or with = FALSE solution (see below).

显然，有一个新的方法可以从1.10.2版开始实现。

Apparently, there is a new way of achieving this from version 1.10.2 onwards.

library(data.table)
cols <- grep("bar|baz", names(mydt), value = TRUE)
mydt[, ..cols]

在发布的解决方案中，它似乎工作最快。

It seems to work the fastest out of the posted solutions.

# Creating a large data.table with 100k rows, 32 columns
n <- 100000
foo_cols <- paste0("foo", 1:30)
big_dt <- data.table(bar = rnorm(n), baz = rnorm(n))
big_dt[, (foo_cols) := rnorm(n)]

# Methods
subsetting <- function(dt) {
    subset(dt, select = grep("bar|baz", names(dt)))
}

usingSD <- function(dt) {
    dt[, .SD, .SDcols = names(dt) %like% "bar|baz"]
}

usingWith <- function(dt) {
    cols <- grep("bar|baz", names(dt), value = TRUE)
    dt[, cols, with = FALSE]
}

usingDotDot <- function(dt) {
    cols <- grep("bar|baz", names(dt), value = TRUE)
    dt[, ..cols]
}

usingPatterns <- function(dt) {
  dt[, .SD, .SDcols = patterns("bar|baz")]
}

# Benchmark
microbenchmark(
    subsetting(big_dt), usingSD(big_dt), usingWith(big_dt), usingDotDot(big_dt),
    times = 5000
)

#Unit: microseconds
#                  expr  min   lq  mean median    uq    max neval
#    subsetting(big_dt)  430  759  1672   1309  1563  82934  5000
#       usingSD(big_dt)  547  951  1872   1461  1797  60357  5000
#     usingWith(big_dt)  278  496  1331   1112  1304  62656  5000
#   usingDotDot(big_dt)  289  483  1392   1117  1344  55878  5000
# usingPatterns(big_dt)  596 1019  1984   1518  1913 120331  5000

这篇关于根据正则表达式选择data.table的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据正则表达式选择data.table的列 [英] Select columns of data.table based on regex

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据正则表达式选择data.table的列 [英] Select columns of data.table based on regex

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭