网络爬虫 - Python还是Perl？ [英] Web Crawler - Python or Perl?

查看：120 发布时间：2019/6/5 16:45:27 python

本文介绍了网络爬虫 - Python还是Perl？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

大家好，

我目前正计划编写自己的网络抓取工具。我知道Python但是没有Perl，而且我很有兴趣知道这两个中哪一个是更好的选择，给出以下场景：

1）I / O问题：我在资源方面的最大限制是

带宽节流颈。

2）效率问题：爬行器必须是快速，强大且具有内存效率和内存效率。尽可能。我正在运行我的所有爬行器

便宜的pcs，大约500 MB RAM和P3到P4处理器

3）兼容性问题：大多数这些爬虫将在Unix上运行

（FreeBSD），所以应该存在一个非常好的编译器，可以在环境下优化我的代码。

什么你的意见是什么？

解决方案

6月9日晚上11点48分，消失... @ gmail.com写道：

大家好，

我目前正计划编写自己的网络抓取工具。我知道Python但是没有Perl，而且我很有兴趣知道这两个中哪一个是更好的选择，给出以下场景：

1）I / O问题：我在资源方面的最大限制是

带宽节流颈。

2）效率问题：爬行器必须是快速，强大且具有内存效率和内存效率。尽可能。我正在运行我的所有爬行器

便宜的pcs，大约500 MB RAM和P3到P4处理器

3）兼容性问题：大多数这些爬虫将在Unix上运行

（FreeBSD），所以应该存在一个非常好的编译器，可以在环境下优化我的代码。

什么你的意见？

无论你使用Perl还是Python编写

网页抓取工具都没关系。我用它们来编写爬虫。你提到的场景（I / O问题，效率，兼容性）对于这两种语言来说并没有两个不同。这两种语言都具有快速I / O.你可以在
Python中使用urllib2模块和/或漂亮的汤来开发爬虫。对于Perl，您可以使用Mechanize或LWP模块。两种语言

都对正则表达式有很好的支持。 Perl稍快一点我听说过b $ b，虽然我自己找不到差别。两者都与* nix兼容

。为了写一个好的爬虫，语言并不重要，这是重要的技术。

问候，

Subeen。
http://love-python.blogspot.com/

di ******* **** @gmail.com 写道：

1）I / O问题：我在资源方面的最大限制将是

带宽节流阀颈。

2）效率问题：爬行器必须快速，稳健且具有内存效率等特性。尽可能。我正在运行我的所有爬行器

便宜的pcs，大约500 MB RAM和P3到P4处理器

3）兼容性问题：大多数这些爬虫将在Unix上运行

（FreeBSD），所以应该存在一个非常好的编译器，可以在这些环境下优化我的代码。

您应该重新考虑您的要求。你希望受到I / O限制，那么为什么你需要一个好的编译器？

？特别是在询问两种解释的

语言时...

考虑使用lxml（使用Python），它几乎拥有你需要的一切

a网络爬虫，支持直接从HTTP URL进行线程解析，并且它足够快速且非常节省内存。

http://codespeak.net/lxml/

Stefan

subeen写道：

可以使用urllib2模块和/或漂亮的汤来开发爬虫

如果你关心a）速度和/或b）记忆效率，那就没有了。

http://blog.ianbicking.org/2008/03/3...r-performance/

Stefan

Hi all,
I am currently planning to write my own web crawler. I know Python but
not Perl, and I am interested in knowing which of these two are a
better choice given the following scenario:

1) I/O issues: my biggest constraint in terms of resource will be
bandwidth throttle neck.
2) Efficiency issues: The crawlers have to be fast, robust and as
"memory efficient" as possible. I am running all of my crawlers on
cheap pcs with about 500 mb RAM and P3 to P4 processors
3) Compatibility issues: Most of these crawlers will run on Unix
(FreeBSD), so there should exist a pretty good compiler that can
optimize my code these under the environments.

What are your opinions?

解决方案

On Jun 9, 11:48 pm, disappeare...@gmail.com wrote:
Hi all,
I am currently planning to write my own web crawler. I know Python but
not Perl, and I am interested in knowing which of these two are a
better choice given the following scenario:

1) I/O issues: my biggest constraint in terms of resource will be
bandwidth throttle neck.
2) Efficiency issues: The crawlers have to be fast, robust and as
"memory efficient" as possible. I am running all of my crawlers on
cheap pcs with about 500 mb RAM and P3 to P4 processors
3) Compatibility issues: Most of these crawlers will run on Unix
(FreeBSD), so there should exist a pretty good compiler that can
optimize my code these under the environments.

What are your opinions?
It really doesn''t matter whether you use Perl or Python for writing
web crawlers. I have used both for writing crawlers. The scenarios you
mentioned (I/O issues, Efficiency, Compatibility) don''t differ two
much for these two languages. Both the languages have fast I/O. You
can use urllib2 module and/or beautiful soup for developing crawler in
Python. For Perl you can use Mechanize or LWP modules. Both languages
have good support for regular expressions. Perl is slightly faster I
have heard, though I don''t find the difference myself. Both are
compatible with *nix. For writing a good crawler, language is not
important, it''s the technology which is important.

regards,
Subeen.
http://love-python.blogspot.com/

di***********@gmail.com wrote:
1) I/O issues: my biggest constraint in terms of resource will be
bandwidth throttle neck.
2) Efficiency issues: The crawlers have to be fast, robust and as
"memory efficient" as possible. I am running all of my crawlers on
cheap pcs with about 500 mb RAM and P3 to P4 processors
3) Compatibility issues: Most of these crawlers will run on Unix
(FreeBSD), so there should exist a pretty good compiler that can
optimize my code these under the environments.
You should rethink your requirements. You expect to be I/O bound, so why do
you require a good "compiler"? Especially when asking about two interpreted
languages...

Consider using lxml (with Python), it has pretty much everything you need for
a web crawler, supports threaded parsing directly from HTTP URLs, and it''s
plenty fast and pretty memory efficient.

http://codespeak.net/lxml/

Stefan

subeen wrote:
can use urllib2 module and/or beautiful soup for developing crawler
Not if you care about a) speed and/or b) memory efficiency.

http://blog.ianbicking.org/2008/03/3...r-performance/

Stefan

这篇关于网络爬虫 - Python还是Perl？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

网络爬虫 - Python还是Perl？ [英] Web Crawler - Python or Perl?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

网络爬虫 - Python还是Perl？ [英] Web Crawler - Python or Perl?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭