何时在“strsplit"中设置“perl=TRUE"不起作用(按预期或根本不起作用)? [英] When does setting 'perl=TRUE' in 'strsplit' does not work (as intended or at all)?

查看:40
本文介绍了何时在“strsplit"中设置“perl=TRUE"不起作用(按预期或根本不起作用)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是在尝试优化一些代码时做了一些基准测试,并观察到 ​​strsplitperl=TRUE 比运行 更快strsplitperl=FALSE.例如,

I just did some benchmarking while trying to optimise some code and observed that strsplit with perl=TRUE is faster than running strsplit with perl=FALSE. For example,

set.seed(1)
ff <- function() paste(sample(10), collapse= " ")
xx <- replicate(1e5, ff())

system.time(t1 <- strsplit(xx, "[ ]"))
#  user  system elapsed 
# 1.246   0.002   1.268 

system.time(t2 <- strsplit(xx, "[ ]", perl=TRUE))
#  user  system elapsed 
# 0.389   0.001   0.392 

identical(t1, t2) 
# [1] TRUE

所以我的问题(或者更确切地说是标题中问题的变体)是,在什么情况下绝对需要 perl=FALSE(省略 fixeduseBytes 参数)?换句话说,我们不能使用 perl=TRUE 做什么而可以通过设置 perl=FALSE 来做?

So my question (or rather a variation of the question in the title) is, under what circumstances would be absolutely need perl=FALSE (leaving out the fixed and useBytes parameters)? In other words, what can't we do using perl=TRUE that can be done by setting perl=FALSE?

推荐答案

来自文档 ;)

性能注意事项

如果您要进行大量正则表达式匹配,包括非常长的字符串,您将需要考虑使用的选项.通常 PCRE 会比默认的正则表达式引擎更快,并且 fixed = TRUE 更快(特别是当每个模式只匹配几次时).

If you are doing a lot of regular expression matching, including on very long strings, you will want to consider the options used. Generally PCRE will be faster than the default regular expression engine, and fixed = TRUE faster still (especially when each pattern is matched only a few times).

当然,这并不能回答总是使用 perl=TRUE 是否有任何危险"的问题

Of course, this does not answer the question of "are there any dangers to always using perl=TRUE"

这篇关于何时在“strsplit"中设置“perl=TRUE"不起作用(按预期或根本不起作用)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆