在大型data.table上使用grepl的内存和性能 [英] Memory and Performance using grepl on large data.table

查看：138 发布时间：2018/5/28 19:40:52 regex r performance grep data.table

本文介绍了在大型data.table上使用grepl的内存和性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在R上执行一个简单的命令而不是大数据集，结果很慢并且使用的内存太多。下面是一个使用两行的例子，虽然我的真实数据集有1.54亿行：

  library（data.table）
 Dt <-data.table（title1 = c（有史以来最酷的歌曲，
世界上最棒的音乐），
 title2 = c（最酷的歌曲，最棒的音乐 ））
 
 Dt $匹配< -sapply（seq_len（nrow（Dt）），function（x）grepl（Dt $ title2 [x]，Dt $ title1 [x]））

Dt $ Match的结果应该为TRUE，TRUE。
在运行这个脚本之前，我剩下大约12Gb的内存，但是当这个慢速代码运行时，内存已经用完了。

还有更多有效的方法来获得相同的结果？也许利用Data Table包？

解决方案使用 stringi 更高性能。

stri_detect_fixed（Dt $ title1，Dt $ title2）应该是您要查找的内容。

（感谢弗兰克。弗兰克实际上找到了确切的DT答案：

  Dt [，stri_detect_fixed（title1，title2）]

后缀 ..._ fixed 比 _regex 的快。

I'm performing a simple command in R over a large dataset, and the result is slow and uses too much memory. Here's a an example using two rows, although my real dataset has 154 million rows:
library(data.table) Dt<-data.table(title1=c("The coolest song ever", "The greatest music in the world"), title2=c("coolest song","greatest music")) Dt$Match<-sapply(seq_len(nrow(Dt)), function(x) grepl(Dt$title2[x],Dt$title1[x]))
The result of Dt$Match should be TRUE, TRUE. Before running this script, I have about 12 Gb of RAM left, but as this slow code runs, memory is being used up.

Is there a more efficient way to get the same results? Perhaps leveraging the Data Table package?
解决方案
Use stringi library, it's more performant.

stri_detect_fixed(Dt$title1, Dt$title2) should be what you're looking for.

(thanks to Frank. Frank actually found the exact DT answer:
Dt[, stri_detect_fixed(title1, title2)]
The functions with suffix ..._fixed are faster than the _regex ones.

这篇关于在大型data.table上使用grepl的内存和性能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在大型data.table上使用grepl的内存和性能 [英] Memory and Performance using grepl on large data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在大型data.table上使用grepl的内存和性能 [英] Memory and Performance using grepl on large data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭