Python lxml/beautiful soup在网页上查找所有链接 [英] Python lxml/beautiful soup to find all links on a web page

查看:87
本文介绍了Python lxml/beautiful soup在网页上查找所有链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个脚本来读取网页,并建立一个符合特定条件的链接数据库.现在,我陷入了lxml的困境,并了解如何从html抓取所有<a href>.

I am writing a script to read a web page, and build a database of links that matches a certain criteria. Right now I am stuck with lxml and understanding how to grab all the <a href>'s from the html...

result = self._openurl(self.mainurl)
content = result.read()
html = lxml.html.fromstring(content)
print lxml.html.find_rel_links(html,'href')

推荐答案

使用XPath.诸如此类(无法从此处进行测试):

Use XPath. Something like (can't test from here):

urls = html.xpath('//a/@href')

这篇关于Python lxml/beautiful soup在网页上查找所有链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆