何时在“strsplit"中设置“perl=TRUE"不起作用(按预期或根本不起作用)? [英] When does setting 'perl=TRUE' in 'strsplit' does not work (as intended or at all)?
问题描述
我只是在尝试优化一些代码时做了一些基准测试,并观察到 strsplit
和 perl=TRUE
比运行 更快strsplit
和 perl=FALSE
.例如,
I just did some benchmarking while trying to optimise some code and observed that strsplit
with perl=TRUE
is faster than running strsplit
with perl=FALSE
. For example,
set.seed(1)
ff <- function() paste(sample(10), collapse= " ")
xx <- replicate(1e5, ff())
system.time(t1 <- strsplit(xx, "[ ]"))
# user system elapsed
# 1.246 0.002 1.268
system.time(t2 <- strsplit(xx, "[ ]", perl=TRUE))
# user system elapsed
# 0.389 0.001 0.392
identical(t1, t2)
# [1] TRUE
所以我的问题(或者更确切地说是标题中问题的变体)是,在什么情况下绝对需要 perl=FALSE
(省略 fixed
和useBytes
参数)?换句话说,我们不能使用 perl=TRUE
做什么而可以通过设置 perl=FALSE
来做?
So my question (or rather a variation of the question in the title) is, under what circumstances would be absolutely need perl=FALSE
(leaving out the fixed
and useBytes
parameters)? In other words, what can't we do using perl=TRUE
that can be done by setting perl=FALSE
?
推荐答案
来自文档 ;)
性能注意事项
如果您要进行大量正则表达式匹配,包括非常长的字符串,您将需要考虑使用的选项.通常 PCRE 会比默认的正则表达式引擎更快,并且 fixed = TRUE 更快(特别是当每个模式只匹配几次时).
If you are doing a lot of regular expression matching, including on very long strings, you will want to consider the options used. Generally PCRE will be faster than the default regular expression engine, and fixed = TRUE faster still (especially when each pattern is matched only a few times).
当然,这并不能回答总是使用 perl=TRUE
是否有任何危险"的问题
Of course, this does not answer the question of "are there any dangers to always using perl=TRUE
"
这篇关于何时在“strsplit"中设置“perl=TRUE"不起作用(按预期或根本不起作用)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!