使用Perl6处理大型文本文件太慢.(2014-09) [英] Using Perl6 to process a large text file, and it's Too Slow.(2014-09)

查看:89
本文介绍了使用Perl6处理大型文本文件太慢.(2014-09)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

https://github.com/yeahnoob/perl6-perf 中的代码托管,如下:

The code host in https://github.com/yeahnoob/perl6-perf , as follow:

use v6;

my $file=open "wordpairs.txt", :r;

my %dict;
my $line;

repeat {
    $line=$file.get;
    my ($p1,$p2)=$line.split(' ');
    if ?%dict{$p1} {
        %dict{$p1} = "{%dict{$p1}} {$p2}".words;
    } else {
        %dict{$p1} = $p2;
    }
} while !$file.eof;

当"wordpairs.txt"较小时,运行良好.

Running well when the "wordpairs.txt" is small.

但是,当"wordpairs.txt"文件大约为140,000行(每行,两个单词)时,它的运行速度非常慢.即使运行20秒,它也无法完成.

But when the "wordpairs.txt" file is about 140,000 lines (each line, two words), it is running Very Very Slow. And it cannot Finish itself, even after 20 seconds running.

这是什么问题?代码中是否有任何错误? 感谢任何人的帮助!

What's the problem with it? Is there any fault in the code?? Thanks for anyone help!

代码(目前为2014-09-04):

The code(for now, 2014-09-04):

my %dict;
grammar WordPairs {
token word-pair { (\S*) ' ' (\S*) "\n" }
token TOP { <word-pair>* }
}
class WordPairsActions {
method word-pair($/) { %dict{$0}.push($1) }
}
my $match = WordPairs.parse(slurp, :actions(WordPairsActions));
say ?$match;

运行时间成本(目前):

Running time cost(for now):

$ time perl6 countpairs.pl wordpairs.txt
True
The pairs count of the key word "her" in wordpairs.txt is 1036

real    0m24.043s
user    0m23.854s
sys     0m0.181s

$ perl6 --version
This is perl6 version 2014.08 built on MoarVM version 2014.08

该测试的时间性能目前尚不合理(因为相同的Perl 5代码仅花费约160ms),但比我原来的旧Perl6代码要好得多. :)

This test's time performance is not reasonable for now(as the same proper Perl 5 code only cost about 160ms), but Much Better than my original old Perl6 code. :)

PS.整个过程,包括原始测试代码,补丁和示例文本,都在github上.

PS. The whole thing, including original test code, patch and sample text, is on github.

推荐答案

我使用包含10,000行的文件,使用与Christoph的代码非常相似的代码对其进行了测试.大约需要15秒,这比Perl 5慢得多.我怀疑代码很慢,因为该代码使用的某些东西并未像Rakudo和MoarVM的其他部分最近那样获得最大的优化工作.我敢肯定,随着速度变慢,人们的注意力将在接下来的几个月中大大提高代码的性能.

I've tested this with code very similar to Christoph's using a file containing 10,000 lines. It takes around 15 seconds, which as you say, is significantly slower than Perl 5. I suspect that the code is slow because something this code uses hasn't seen as much optimisation effort as other parts of Rakudo and MoarVM have received recently. I'm sure that the performance of the code will improve dramatically over the next few months as whatever is slow sees more attention.

当试图确定为什么某些Perl 6代码慢时,我建议使用--profile在MoarVM上运行perl6,以查看它是否有助于您找到瓶颈.不幸的是,使用此代码,它将指向rakudo内部,而不是您可以改进的任何地方.

When trying to determine why some Perl 6 code is slow I suggest running perl6 on MoarVM with --profile to see whether it helps you find the bottleneck. Unfortunately, with this code it'll point to rakudo internals rather than anything you can improve.

在irc.freenode.net上与#perl6进行交谈无疑是值得的,因为他们将具备提供替代解决方案的知识,并且将来能够提高其性能.

It's certainly worth talking to #perl6 on irc.freenode.net as they'll have the knowledge to offer an alternative solution and will be able to improve its performance in the future.

这篇关于使用Perl6处理大型文本文件太慢.(2014-09)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆