使用grep从data.table子集行，比较行内容 [英] Using grep to subset rows from a data.table, comparing row content

查看：171 发布时间：2017/3/12 10:34:04 r grep data.table string-matching

本文介绍了使用grep从data.table子集行，比较行内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

DT <- data.table(num=c("20031111","1112003","23423","2222004"),y=c("2003","2003","2003","2004"))

> DT
    num    y
1: 20031111 2003
2:  1112003 2003
3:    23423 2003
4:  2222004 2004

我要比较两个单元格的内容，并根据布尔值执行操作。例如，如果num与年份匹配，则创建保存该值的列x。我想到了基于grep的子集化，并且可以工作，但每次都自然地检查整个列，这似乎是浪费。

I want to compare the two cell content, and perform an action based on the boolean value. for instance, if "num" matches the year, create a column x holding that value. I thought about subsetting based on grep, and that works, but naturally checks the whole column every time which seems wasteful

DT[grep(y,num)] # works with a pattern>1 warning

我可以应用（）我的方式，但也许有一个data.table方式？

I could apply() my way but perhaps there's a data.table way?

谢谢

推荐答案

如果你喜欢使用 stringi 包，这是一种利用 stringi 函数矢量化模式和字符串：

If you're happy using the stringi package, this is a way that takes advantage of the fact that the stringi functions vectorise both pattern and string:

DT[stri_detect_fixed(num, y), x := num])

根据数据，它可能比Veerenda Gadekar发布的方法更快。

Depending on the data, it may be faster than the method posted by Veerenda Gadekar.

DT <- data.table(num=paste0(sample(1000), sample(2001:2010, 1000, TRUE)),
                 y=as.character(sample(2001:2010, 1000, TRUE)))
microbenchmark(
    vg = DT[, x := grep(y, num, value=TRUE, fixed=TRUE), by = .(num, y)],
    nk = DT[stri_detect_fixed(num, y), x := num]
)

#Unit: microseconds
# expr      min       lq     mean   median       uq      max neval
#   vg 6027.674 6176.397 6513.860 6278.689 6370.789 9590.398   100
#   nk  975.260 1007.591 1116.594 1047.334 1110.734 3833.051   100

这篇关于使用grep从data.table子集行，比较行内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用grep从data.table子集行，比较行内容 [英] Using grep to subset rows from a data.table, comparing row content

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用grep从data.table子集行，比较行内容 [英] Using grep to subset rows from a data.table, comparing row content

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭