文本处理-Python vs Perl性能 [英] Text processing - Python vs Perl performance

查看:106
本文介绍了文本处理-Python vs Perl性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的Perl和Python脚本,可以对大约21个日志文件进行一些简单的文本处理,每个日志文件大约300KB至1MB(最大)x重复5次(由于,总共125个文件) log 重复5次).

Here is my Perl and Python script to do some simple text processing from about 21 log files, each about 300 KB to 1 MB (maximum) x 5 times repeated (total of 125 files, due to the log repeated 5 times).

Python代码(已修改为使用编译的rere.I的代码)

Python Code (code modified to use compiled re and using re.I)

#!/usr/bin/python

import re
import fileinput

exists_re = re.compile(r'^(.*?) INFO.*Such a record already exists', re.I)
location_re = re.compile(r'^AwbLocation (.*?) insert into', re.I)

for line in fileinput.input():
    fn = fileinput.filename()
    currline = line.rstrip()

    mprev = exists_re.search(currline)

    if(mprev):
        xlogtime = mprev.group(1)

    mcurr = location_re.search(currline)

    if(mcurr):
        print fn, xlogtime, mcurr.group(1)

Perl代码

#!/usr/bin/perl

while (<>) {
    chomp;

    if (m/^(.*?) INFO.*Such a record already exists/i) {
        $xlogtime = $1;
    }

    if (m/^AwbLocation (.*?) insert into/i) {
        print "$ARGV $xlogtime $1\n";
    }
}

而且,在我的PC上,两个代码都生成完全相同的10,790行结果文件.而且,这是Cygwin的Perl和Python实现的完成时间.

And, on my PC both code generates exactly the same result file of 10,790 lines. And, here is the timing done on Cygwin's Perl and Python implementations.

User@UserHP /cygdrive/d/tmp/Clipboard
# time /tmp/scripts/python/afs/process_file.py *log* *log* *log* *log* *log* >
summarypy.log

real    0m8.185s
user    0m8.018s
sys     0m0.092s

User@UserHP /cygdrive/d/tmp/Clipboard
# time /tmp/scripts/python/afs/process_file.pl *log* *log* *log* *log* *log* >
summarypl.log

real    0m1.481s
user    0m1.294s
sys     0m0.124s

最初,使用Python花费了10.2秒,而使用Perl花费了1.9秒.

Originally, it took 10.2 seconds using Python and only 1.9 secs using Perl for this simple text processing.

(更新),但是,在编译了Python的re版本之后,现在在Python中需要8.2秒,在Perl中需要1.5秒.仍然Perl快得多.

(UPDATE) but, after the compiled re version of Python, it now takes 8.2 seconds in Python and 1.5 seconds in Perl. Still Perl is much faster.

有没有一种方法可以提高Python的速度,或者很明显,Perl将成为简单文本处理的快速工具.

Is there a way to improve the speed of Python at all OR it is obvious that Perl will be the speedy one for simple text processing.

通过这种方式,这不是我对简单文本处理所做的唯一测试...而且,我以不同的方式编写源代码,总是Perl总是大获全胜.而且,对于简单的m/regex/匹配和打印内容,Python的性能没有一次提高.

By the way this was not the only test I did for simple text processing... And, each different way I make the source code, always always Perl wins by a large margin. And, not once did Python performed better for simple m/regex/ match and print stuff.

请不要建议使用C,C ++,Assembly,其他版本的 Python等.

Please do not suggest to use C, C++, Assembly, other flavours of Python, etc.

我正在寻找使用内置标准Python的解决方案 与标准Perl比较模块(甚至不使用模块). 伙计,由于它的可读性,我希望使用Python来完成所有任务,但是 放弃速度,我不这么认为.

I am looking for a solution using Standard Python with its built-in modules compared against Standard Perl (not even using the modules). Boy, I wish to use Python for all my tasks due to its readability, but to give up speed, I don't think so.

因此,请提出如何改进代码以使其具有可比性的建议 Perl获得结果.

So, please suggest how can the code be improved to have comparable results with Perl.

更新日期:2012-10-18

正如其他用户所建议的那样,Perl占据了它的位置,Python占据了它的位置.

As other users suggested, Perl has its place and Python has its.

因此,对于这个问题,可以放心地得出结论,对于每行数百或数千个文本文件的简单正则表达式匹配并将结果写入文件(或打印到屏幕), Perl将始终,始终胜任这项工作.就这么简单.

So, for this question, one can safely conclude that for simple regex match on each line for hundreds or thousands of text files and writing the results to a file (or printing to screen), Perl will always, always WIN in performance for this job. It as simple as that.

请注意,当我说Perl在性能上胜出时...仅比较标准Perl和Python ...不求助于一些晦涩难懂的模块(对于像我这样的普通用户而言晦涩难懂),也不调用C,C ++,汇编Python或Perl中的库.我们没有时间去学习所有这些额外的步骤和安装,以完成简单的文本匹配作业.

Please note that when I say Perl wins in performance... only standard Perl and Python is compared... not resorting to some obscure modules (obscure for a normal user like me) and also not calling C, C++, assembly libraries from Python or Perl. We don't have time to learn all these extra steps and installation for a simple text matching job.

因此,Perl可以进行文本处理和正则表达式.

So, Perl rocks for text processing and regex.

Python在其他地方也占有一席之地.

Python has its place to rock in other places.

更新2013-05-29:一篇出色的文章,做了类似的比较在这里. Perl再次赢得了简单文本匹配的胜利.有关更多详细信息,请阅读文章.

Update 2013-05-29: An excellent article that does similar comparison is here. Perl again wins for simple text matching... And for more details, read the article.

推荐答案

这正是Perl设计的功能,因此它速度更快也就不足为奇了.

This is exactly the sort of stuff that Perl was designed to do, so it doesn't surprise me that it's faster.

Python代码中的一个简单优化方法是预编译这些正则表达式,因此不会每次都重新编译它们.

One easy optimization in your Python code would be to precompile those regexes, so they aren't getting recompiled each time.

exists_re = re.compile(r'^(.*?) INFO.*Such a record already exists')
location_re = re.compile(r'^AwbLocation (.*?) insert into')

然后在您的循环中:

mprev = exists_re.search(currline)

mcurr = location_re.search(currline)

这本身不会神奇地使您的Python脚本与Perl脚本保持一致,但是在不首先编译的情况下在循环中重复调用re是Python中的不良做法.

That by itself won't magically bring your Python script in line with your Perl script, but repeatedly calling re in a loop without compiling first is bad practice in Python.

这篇关于文本处理-Python vs Perl性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆