网络爬虫 - Python还是Perl? [英] Web Crawler - Python or Perl?
问题描述
大家好,
我目前正计划编写自己的网络抓取工具。我知道Python但是没有Perl,而且我很有兴趣知道这两个中哪一个是更好的选择,给出以下场景:
1)I / O问题:我在资源方面的最大限制是
带宽节流颈。
2)效率问题:爬行器必须是快速,强大且具有内存效率和内存效率。尽可能。我正在运行我的所有爬行器
便宜的pcs,大约500 MB RAM和P3到P4处理器
3)兼容性问题:大多数这些爬虫将在Unix上运行
(FreeBSD),所以应该存在一个非常好的编译器,可以在环境下优化我的代码。
什么你的意见是什么?
6月9日晚上11点48分,消失... @ gmail.com写道:
大家好,
我目前正计划编写自己的网络抓取工具。我知道Python但是没有Perl,而且我很有兴趣知道这两个中哪一个是更好的选择,给出以下场景:
1)I / O问题:我在资源方面的最大限制是
带宽节流颈。
2)效率问题:爬行器必须是快速,强大且具有内存效率和内存效率。尽可能。我正在运行我的所有爬行器
便宜的pcs,大约500 MB RAM和P3到P4处理器
3)兼容性问题:大多数这些爬虫将在Unix上运行
(FreeBSD),所以应该存在一个非常好的编译器,可以在环境下优化我的代码。
什么你的意见?
无论你使用Perl还是Python编写
网页抓取工具都没关系。我用它们来编写爬虫。你提到的场景(I / O问题,效率,兼容性)对于这两种语言来说并没有两个不同。这两种语言都具有快速I / O.你可以在
Python中使用urllib2模块和/或漂亮的汤来开发爬虫。对于Perl,您可以使用Mechanize或LWP模块。两种语言
都对正则表达式有很好的支持。 Perl稍快一点我听说过b $ b,虽然我自己找不到差别。两者都与* nix兼容
。为了写一个好的爬虫,语言并不重要,这是重要的技术。
问候,
Subeen。
http://love-python.blogspot.com/
di ******* **** @gmail.com 写道:
1)I / O问题:我在资源方面的最大限制将是
带宽节流阀颈。
2)效率问题:爬行器必须快速,稳健且具有内存效率等特性。尽可能。我正在运行我的所有爬行器
便宜的pcs,大约500 MB RAM和P3到P4处理器
3)兼容性问题:大多数这些爬虫将在Unix上运行
(FreeBSD),所以应该存在一个非常好的编译器,可以在这些环境下优化我的代码。
您应该重新考虑您的要求。你希望受到I / O限制,那么为什么你需要一个好的编译器?
?特别是在询问两种解释的
语言时...
考虑使用lxml(使用Python),它几乎拥有你需要的一切
a网络爬虫,支持直接从HTTP URL进行线程解析,并且它足够快速且非常节省内存。
http://codespeak.net/lxml/
Stefan
subeen写道:
可以使用urllib2模块和/或漂亮的汤来开发爬虫
如果你关心a)速度和/或b)记忆效率,那就没有了。
http://blog.ianbicking.org/2008/03/3...r-performance/
Stefan
Hi all,
I am currently planning to write my own web crawler. I know Python but
not Perl, and I am interested in knowing which of these two are a
better choice given the following scenario:
1) I/O issues: my biggest constraint in terms of resource will be
bandwidth throttle neck.
2) Efficiency issues: The crawlers have to be fast, robust and as
"memory efficient" as possible. I am running all of my crawlers on
cheap pcs with about 500 mb RAM and P3 to P4 processors
3) Compatibility issues: Most of these crawlers will run on Unix
(FreeBSD), so there should exist a pretty good compiler that can
optimize my code these under the environments.
What are your opinions?
On Jun 9, 11:48 pm, disappeare...@gmail.com wrote:Hi all,
I am currently planning to write my own web crawler. I know Python but
not Perl, and I am interested in knowing which of these two are a
better choice given the following scenario:
1) I/O issues: my biggest constraint in terms of resource will be
bandwidth throttle neck.
2) Efficiency issues: The crawlers have to be fast, robust and as
"memory efficient" as possible. I am running all of my crawlers on
cheap pcs with about 500 mb RAM and P3 to P4 processors
3) Compatibility issues: Most of these crawlers will run on Unix
(FreeBSD), so there should exist a pretty good compiler that can
optimize my code these under the environments.
What are your opinions?It really doesn''t matter whether you use Perl or Python for writing
web crawlers. I have used both for writing crawlers. The scenarios you
mentioned (I/O issues, Efficiency, Compatibility) don''t differ two
much for these two languages. Both the languages have fast I/O. You
can use urllib2 module and/or beautiful soup for developing crawler in
Python. For Perl you can use Mechanize or LWP modules. Both languages
have good support for regular expressions. Perl is slightly faster I
have heard, though I don''t find the difference myself. Both are
compatible with *nix. For writing a good crawler, language is not
important, it''s the technology which is important.
regards,
Subeen.
http://love-python.blogspot.com/
di***********@gmail.com wrote:1) I/O issues: my biggest constraint in terms of resource will be
bandwidth throttle neck.
2) Efficiency issues: The crawlers have to be fast, robust and as
"memory efficient" as possible. I am running all of my crawlers on
cheap pcs with about 500 mb RAM and P3 to P4 processors
3) Compatibility issues: Most of these crawlers will run on Unix
(FreeBSD), so there should exist a pretty good compiler that can
optimize my code these under the environments.You should rethink your requirements. You expect to be I/O bound, so why do
you require a good "compiler"? Especially when asking about two interpreted
languages...
Consider using lxml (with Python), it has pretty much everything you need for
a web crawler, supports threaded parsing directly from HTTP URLs, and it''s
plenty fast and pretty memory efficient.
http://codespeak.net/lxml/
Stefan
subeen wrote:can use urllib2 module and/or beautiful soup for developing crawlerNot if you care about a) speed and/or b) memory efficiency.
http://blog.ianbicking.org/2008/03/3...r-performance/
Stefan
这篇关于网络爬虫 - Python还是Perl?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!