python或C中的网络爬虫? [英] web crawler in python or C?

查看:77
本文介绍了python或C中的网络爬虫?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好。我必须实现一个局部爬虫作为我的
项目的一部分。我应该用什么语言实现?b $ b C或Python?Python虽然有快速的发展循环,但我的关注是

速度也。我想在开发速度和

爬虫速度之间取得平衡。因为Python是一种解释性语言,所以它相当于

慢。将在大量页面上工作的爬虫应该尽可能快地获得
。一个可能的实现将部分用C实现

部分是在Python中,所以我可以充分利用这两个世界。但我不知道接近它。任何人都可以指导我

我应该做什么部分用C实现什么应该用Python?

解决方案

" abhinav" < AB *********** @ gmail.com>写道:

将要处理大量页面的爬虫应该尽可能快。




什么样的你有网络连接,这是足够快的

即使是一个相当低效的爬虫也不会让它饱和吗?


这是DSL宽带128kbps。但这不是重点。我说的是

,python可以很好地实现快速爬虫算法或

我应该使用C.处理大量数据,多线程,文件

处理,排名启发式,以及维护庞大的数据

结构。应该是什么语言以免损害

速度很快。基于python的抓取器与基于C / /
的抓取器的性能有什么关系。我应该使用这两种语言(部分是C和python).How

我应该决定在C中实现什么部分以及什么应该在python中完成?

请gui de me.Thanks。




abhinav写道:

这是DSL宽带128kbps。但是这不是重点。我说的是什么,这对于实现快速爬虫算法是好的,或者我应该使用C.


但是网络爬虫将是*主要* I / O绑定 - 所以语言

效率不会是主要问题。用Python实现了几个web爬虫



处理大量数据,多线程,文件处理,排名启发式和维护大量数据
结构。什么应该是语言,以免在很快的速度上妥协。基于python的爬虫与基于C /的爬虫的性能是什么。我应该使用这两种语言(部分是C和python) 。如何


如果您的数据处理要求相当沉重,您可能会*获得速度优势,以C语言编码并访问它们
来自Python的



usdual建议(似乎适用于你),是用Python中的
原型(这将是多少比C更有趣然后测试。


配置文件找到你真正的瓶颈(如果Python一个不够快

足够 - 它可能be),并将你的瓶颈移到C.


一切顺利,


Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

我应该决定在C中实现什么部分以及在python中应该做什么?
请指导我。谢谢。




Hi guys.I have to implement a topical crawler as a part of my
project.What language should i implement
C or Python?Python though has fast development cycle but my concern is
speed also.I want to strke a balance between development speed and
crawler speed.Since Python is an interpreted language it is rather
slow.The crawler which will be working on huge set of pages should be
as fast as possible.One possible implementation would be implementing
partly in C and partly in Python so that i can have best of both
worlds.But i don''t know to approach about it.Can anyone guide me on
what part should i implement in C and what should be in Python?

解决方案

"abhinav" <ab***********@gmail.com> writes:

The crawler which will be working on huge set of pages should be
as fast as possible.



What kind of network connection do you have, that''s fast enough
that even a fairly cpu-inefficient crawler won''t saturate it?


It is DSL broadband 128kbps.But thats not the point.What i am saying is
that would python be fine for implementing fast crawler algorithms or
should i use C.Handling huge data,multithreading,file
handling,heuristics for ranking,and maintaining huge data
structures.What should be the language so as not to compromise that
much on speed.What is the performance of python based crawlers vs C
based crawlers.Should I use both the languages(partly C and python).How
should i decide what part to be implemented in C and what should be
done in python?
Please guide me.Thanks.



abhinav wrote:

It is DSL broadband 128kbps.But thats not the point.What i am saying is
that would python be fine for implementing fast crawler algorithms or
should i use C.
But a web crawler is going to be *mainly* I/O bound - so language
efficiency won''t be the main issue. There are several web crawler
implemented in Python.
Handling huge data,multithreading,file
handling,heuristics for ranking,and maintaining huge data
structures.What should be the language so as not to compromise that
much on speed.What is the performance of python based crawlers vs C
based crawlers.Should I use both the languages(partly C and python).How
If your data processing requirements are fairly heavy you will
*probably* get a speed advantage coding them in C and accessing them
from Python.

The usdual advice (which seems to be applicable to you), is to
prototype in Python (which will be much more fun than in C) then test.

Profile to find your real bottlenecks (if the Python one isn''t fast
enough - which it may be), and move your bottlenecks to C.

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
should i decide what part to be implemented in C and what should be
done in python?
Please guide me.Thanks.




这篇关于python或C中的网络爬虫?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆