BeautifulSoup 获取 href [英] BeautifulSoup getting href
问题描述
我有以下汤
:
<a href="some_url">next</a>
<span class="class">...</span>
我想从中提取href,"some_url"
From this I want to extract the href, "some_url"
如果我只有一个标签,我可以做到,但这里有两个标签.我也可以得到文本 'next'
但这不是我想要的.
I can do it if I only have one tag, but here there are two tags. I can also get the text 'next'
but that's not what I want.
另外,在某处是否有很好的 API 描述和示例.我正在使用标准文档,但我正在寻找一些东西更有条理.
Also, is there a good description of the API somewhere with examples. I'm using the standard documentation, but I'm looking for something a little more organized.
推荐答案
您可以通过以下方式使用 find_all
来查找每个具有 的
属性,并打印每个:a
元素href
You can use find_all
in the following way to find every a
element that has an href
attribute, and print each one:
from BeautifulSoup import BeautifulSoup
html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''
soup = BeautifulSoup(html)
for a in soup.find_all('a', href=True):
print "Found the URL:", a['href']
输出将是:
Found the URL: some_url
Found the URL: another_url
请注意,如果您使用的是旧版本的 BeautifulSoup(版本 4 之前),则此方法的名称为 findAll
.在第 4 版中,BeautifulSoup 的方法名称已更改为符合 PEP 8,因此您应该使用 find_all
代替.
Note that if you're using an older version of BeautifulSoup (before version 4) the name of this method is findAll
. In version 4, BeautifulSoup's method names were changed to be PEP 8 compliant, so you should use find_all
instead.
如果你希望所有标签带有href
,你可以省略name
参数:
If you want all tags with an href
, you can omit the name
parameter:
href_tags = soup.find_all(href=True)
这篇关于BeautifulSoup 获取 href的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!