Python是否适合网络爬虫? [英] Is Python good for web crawlers?

查看:81
本文介绍了Python是否适合网络爬虫?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道python是否是构建网络爬虫的好语言

with?例如,构建一个程序,该程序将定期搜索x / $
数量的站点以检查产品的可用性。或者搜索

以获取包含XYZ一词的新闻文章。这些只是随机的

想法,试图进一步解释我的问题。好吧,如果您有关于此的

意见,请告诉我,因为我非常感兴趣

听听您的意见。谢谢。

I was wondering if python is a good language to build a web crawler
with? For example, to construct a program that will routinely search x
amount of sites to check the availability of a product. Or to search
for news articles containing the word ''XYZ''. These are just random
ideas to try to explain my question a bit further. Well if you have an
opinion about this please let me know becasue I am very interested to
hear what you have to say. Thanks.

推荐答案

2006年2月7日08:33:28 -0800,Tempo< br ******* @ gmail.com>写道:
On 7 Feb 2006 08:33:28 -0800, Tempo <br*******@gmail.com> wrote:
我想知道python是否是构建网络爬虫的好语言
?例如,构建一个程序,该程序将定期搜索x
数量的站点以检查产品的可用性。或者搜索包含XYZ一词的新闻文章。这些只是随机的想法,试图进一步解释我的问题。如果您对此有任何意见,请告诉我,因为我非常有兴趣听取您的意见。谢谢。
I was wondering if python is a good language to build a web crawler
with? For example, to construct a program that will routinely search x
amount of sites to check the availability of a product. Or to search
for news articles containing the word ''XYZ''. These are just random
ideas to try to explain my question a bit further. Well if you have an
opinion about this please let me know becasue I am very interested to
hear what you have to say. Thanks.




Google提供了一个基本的webcrawler作为谷歌桌面插件名为

Kongulo( http://sourceforge.net/projects/goog-kongulo/

写的在python中。我认为python非常适合这种类型的应用程序
。你的瓶颈总是会下载

页面。


-

Andrew Gwozdziewycz< ap **** @ gmail.com>
http://ihadagreatview.org
http://plasticandroid.org



Google supplies a basic webcrawler as a google desktop plugin called
Kongulo (http://sourceforge.net/projects/goog-kongulo/) which is
written in python. I would think python would be perfect for this sort
of application. Your bottleneck is always going to be downloading the
page.

--
Andrew Gwozdziewycz <ap****@gmail.com>
http://ihadagreatview.org
http://plasticandroid.org


为什么你说抓取器的瓶颈总是会下载页面吗?是因为还没有一个模块可以做到这一点吗?我必须从头开始吗?或带宽问题?

Why do you say that the bottleneck of the crawler will always be
downloading the page? Is it becasue there isn''t already a modual to do
this and I will have to start from scratch? Or a bandwidth issue?


Tempo写道:
Tempo wrote:
为什么你说履带的瓶颈会总是在下载页面?是因为还没有一个模块可以做到这一点,我将不得不从头开始?或带宽问题?
Why do you say that the bottleneck of the crawler will always be
downloading the page? Is it becasue there isn''t already a modual to do
this and I will have to start from scratch? Or a bandwidth issue?




由于带宽 - 不一定是您的直接带宽,而是您的上行链路和相关网站之间的最大流量

>

Diez



Because of bandwidth - not necessarily yours directly, but the maximum flow
between your uplink and the site in question. It will always take at least
a fractioin of a second up to several seconds until the data is there - in
that time, lots of python code can run.

Diez


这篇关于Python是否适合网络爬虫?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆