找到带有beautifulsoup的特定链接 [英] Find specific link w/ beautifulsoup
问题描述
我不知道如何在我的一生中找到以某些文本开头的链接.findall('a') 工作正常,但它太多了.我只想列出所有以http://www.nhl.com/ice/boxscore.htm?id=
Hi I cannot figure out how to find links which begin with certain text for the life of me. findall('a') works fine, but it's way too much. I just want to make a list of all links that begin with http://www.nhl.com/ice/boxscore.htm?id=
有人可以帮我吗?
非常感谢
推荐答案
先设置一个测试文档,用BeautifulSoup打开解析器:
First set up a test document and open up the parser with BeautifulSoup:
<div><a href="http://www.nhl.com/ice/boxscore.htm?id=3">一些链接</a>
>>> from BeautifulSoup import BeautifulSoup
>>> doc = '<html><body><div><a href="something">yep</a></div><div><a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a></div><a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a></body></html>'
>>> soup = BeautifulSoup(doc)
>>> print soup.prettify()
<html>
<body>
<div>
<a href="something">
yep
</a>
</div>
<div>
<a href="http://www.nhl.com/ice/boxscore.htm?id=3">
somelink
</a>
</div>
<a href="http://www.nhl.com/ice/boxscore.htm?id=7">
another
</a>
</body>
</html>
<a href="http://www.nhl.com/ice/boxscore.htm?id=7">其他</a>