Perl 6 中的正则表达式速度 [英] Regex speed in Perl 6

查看:83
本文介绍了Perl 6 中的正则表达式速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以前只使用 bash 正则表达式、grepsedawk 等.在尝试了 Perl 6 regexes 之后,我的印象是它们的工作速度比我预期的要慢,但可能是因为我处理它们的方式不正确.我做了一个简单的测试来比较 Perl 6bash 中的类似操作.这是 Perl 6 代码:

I've been previously working only with bash regular expressions, grep, sed, awk etc. After trying Perl 6 regexes I've got an impression that they work slower than I would expect, but probably the reason is that I handle them incorrectly. I've made a simple test to compare similar operations in Perl 6 and in bash. Here is the Perl 6 code:

my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5

my @search = <abcde cdeff fabcd>;

my token search {
    @search
}

my @new_array = @array.grep({/ <search> /});
say @new_array;

然后我将 @array 打印到一个名为 array 的文件中(有 7776 行),制作了一个名为 search 的文件,有 3 行(abcde, cdeff, fabcd) 并进行了简单的 grep 搜索.

Then I printed @array into a file named array (with 7776 lines), made a file named search with 3 lines (abcde, cdeff, fabcd) and made a simple grep search.

$ grep -f search array

在两个程序产生相同的结果后,正如预期的那样,我测量了它们的工作时间.

After both programs produced the same result, as expected, I measured the time they were working.

$ time perl6 search.p6
real    0m6,683s
user    0m6,724s
sys     0m0,044s
$ time grep -f search array
real    0m0,009s
user    0m0,008s
sys     0m0,000s

那么,我在 Perl 6 代码中做错了什么?

So, what am I doing wrong in my Perl 6 code?

UPD:如果我将搜索标记传递给 grep,循环遍历 @search 数组,程序会运行得更快:

UPD: If I pass the search tokens to grep, looping through the @search array, the program works much faster:

my @array = "aaaaa" .. "fffff";
say +@array;

my @search = <abcde cdeff fabcd>;

for @search -> $token {
  say ~@array.grep({/$token/});
}

$ time perl6 search.p6
real    0m1,378s
user    0m1,400s
sys     0m0,052s

如果我手动定义每个搜索模式,它的工作速度会更快:

And if I define each search pattern manually, it works even faster:

my @array = "aaaaa" .. "fffff";
say +@array; # 7776 = 6 ** 5

say ~@array.grep({/abcde/});
say ~@array.grep({/cdeff/});
say ~@array.grep({/fabcd/});

$ time perl6 search.p6
real    0m0,587s
user    0m0,632s
sys     0m0,036s

推荐答案

grep 命令比 Perl 6 的正则表达式简单得多,而且它还有很多年的时间来优化.这也是 Rakudo 中没有看到太多优化的领域之一;部分原因是它被认为是一件很难处理的事情.

The grep command is much simpler than Perl 6's regular expressions, and it has had many more years to get optimized. It is also one of the areas that hasn't seen as much optimizing in Rakudo; partly because it is seen as being a difficult thing to work on.

对于性能更高的示例,您可以预编译正则表达式:

For a more performant example, you could pre-compile the regex:

my $search = "/@search.join('|')/".EVAL;
#  $search =  /abcde|cdeff|fabcd/;
say ~@array.grep($search);

该更改使其在大约半秒内运行.

That change causes it to run in about half a second.

如果 @search 中存在恶意数据的可能性,并且您必须这样做,那么使用可能更安全:

If there is any chance of malicious data in @search, and you have to do this it may be safer to use:

"/@search».Str».perl.join('|')/".EVAL

<小时>

编译器不能完全为 /@search/ 生成优化的代码,因为 @search 在编译正则表达式后可能会改变.可能发生的情况是,第一次使用正则表达式时,它会被重新编译为更好的形式,然后只要 @search 没有被修改就将其缓存.
(我认为 Perl 5 做了类似的事情)


The compiler can't quite generate that optimized code for /@search/ as @search could change after the regex gets compiled. What could happen is that the first time the regex is used it gets re-compiled into the better form, and then cache it as long as @search doesn't get modified.
(I think Perl 5 does something similar)

您必须牢记的一个重要事实是,Perl 6 中的正则表达式只是一种用特定领域的子语言编写的方法.

One important fact you have to keep in mind is that a regex in Perl 6 is just a method that is written in a domain specific sub-language.

这篇关于Perl 6 中的正则表达式速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆