需要编写一个简单的Web爬虫 [英] need to write a simple web crawler

查看:77
本文介绍了需要编写一个简单的Web爬虫的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

hai


i是一名学生,需要使用python编写一个简单的网络爬虫,需要一些如何启动的指导..我需要使用BFS和DFS抓取网页...一个使用堆栈和其他使用队列...


i将只尝试过时的网页,所以我可以学习如何做到这一点..我已经采取了当然叫做搜索引擎,需要一些帮助才能做到这一点......


帮助任何一个领子都会受到赞赏..


谢谢你

解决方案

实际上它非常简单,你需要一个东西,一种方法来解析一个html页面(可以在python lib中找到),正如你在帖子,呼吸优先搜索(BFS)和深度优先搜索(DFS)。您还需要某种结构来确定您之前是否访问过某个页面(可能是哈希列表?)


让我们假设我们使用BFS,并使用pythons list方法,你从某个页面开始( www.thescripts.com ?)


hash = {}

stack = []

stack.push(" www.thescripts.com")

while(len(stack)> 0):

currpage = stack.pop()

hash [currpage] = 1#设置为访问

links = findlinks(currpage)#此方法查找页面的所有链接

#在这里你可以做你想做的事情,比如找一些文字,下载

#some image etc etc.

#推送堆栈上的所有链接

展开 | 选择 | Wrap | 行号


你好朋友..我也参与了一个爬虫...分享你得到的deas ........



你好朋友..我太涉及开发一个爬虫..请分享你得到的deas ........



你好,你想要什么?从你的爬行中获取?


-kudos


hai

i am a student and need to write a simple web crawler using python and need some guidance of how to start.. i need to crawl web pages using BFS and also DFS... one using stacks and other using queues...

i will try on the obsolete web pages only and so tht i can learn of how to do that.. i have taken a course called search engines and need some help in doing that...

help in any knind would be appreciated..

thank u

解决方案

Its quite easy actually, you need one thing, one way to parse a html page (which is found in the python lib), and as you pointed out in your post, Breath first search (BFS) and depth first search (DFS). You also need some kind of structure to determine if you visited a certain page before (maybe a hash list?)

Lets assume that we use BFS, and use pythons list method, and that you start on a certain page (www.thescripts.com ?:)

hash = {}
stack = []
stack.push("www.thescripts.com")

while(len(stack) > 0):
currpage = stack.pop()
hash[currpage] = 1 # sets it to visited
links = findlinks(currpage) # this method finds all the links of the page
# here you can do what you would do, like finding some text, downloading
# some image etc etc
# push all the links on the stack

Expand|Select|Wrap|Line Numbers


Hi friend.. me too involving develpin a crawler.. share the deas you got please........


Hi friend.. me too involving develpin a crawler.. share the deas you got please........

Hi, what do you want to get from your crawl?

-kudos


这篇关于需要编写一个简单的Web爬虫的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆