查找特定链接瓦特/ beautifulsoup [英] Find specific link w/ beautifulsoup
问题描述
您好我无法弄清楚如何找到它与特定文本为我的生活开始链接。
的findall('A')工作正常,但它的方式太多了。我只是想与开头的所有链接的列表
http://www.nhl.com/ice/boxscore.htm?id=
Hi I cannot figure out how to find links which begin with certain text for the life of me. findall('a') works fine, but it's way too much. I just want to make a list of all links that begin with http://www.nhl.com/ice/boxscore.htm?id=
谁能帮我?
非常感谢你。
推荐答案
首先建立了一个测试文档,开辟解析器与BeautifulSoup:
First set up a test document and open up the parser with BeautifulSoup:
>>> from BeautifulSoup import BeautifulSoup
>>> doc = '<html><body><div><a href="something">yep</a></div><div><a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a></div><a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a></body></html>'
>>> soup = BeautifulSoup(doc)
>>> print soup.prettify()
<html>
<body>
<div>
<a href="something">
yep
</a>
</div>
<div>
<a href="http://www.nhl.com/ice/boxscore.htm?id=3">
somelink
</a>
</div>
<a href="http://www.nhl.com/ice/boxscore.htm?id=7">
another
</a>
</body>
</html>
接下来,我们可以搜索所有的&LT; A&GT;
带标签的的href
属性开始 http://www.nhl.com/ice/boxscore.htm?id=
。您可以使用常规的前pression吧:
Next, we can search for all <a>
tags with an href
attribute starting with http://www.nhl.com/ice/boxscore.htm?id=
. You can use a regular expression for it:
>>> import re
>>> soup.findAll('a', href=re.compile('^http://www.nhl.com/ice/boxscore.htm\?id='))
[<a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a>, <a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a>]
这篇关于查找特定链接瓦特/ beautifulsoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!