BeautifulSoup越来越HREF [英] BeautifulSoup getting href
问题描述
我有以下汤:
<a href="some_url">next</a>
<span class="class">...</span>
从这个我想提取的HREF,SOME_URL
我能做到这一点,如果我只有一个标签,但这里有两个标签。我还可以得到文字下一步
但是这不是我想要的。
I can do it if I only have one tag, but here there are two tags. I can also get the text 'next'
but that's not what I want.
此外,有没有什么地方有例子API的一个很好的说明。我使用标准文档,但我正在寻找的东西一点点更有条理。
Also, is there a good description of the API somewhere with examples. I'm using the standard documentation, but I'm looking for something a little more organized.
推荐答案
您可以使用 find_all
通过以下方式来找到每一个 A
有一个的href
属性,并打印每一个元素:
You can use find_all
in the following way to find every a
element that has an href
attribute, and print each one:
from BeautifulSoup import BeautifulSoup
html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''
soup = BeautifulSoup(html)
for a in soup.find_all('a', href=True):
print "Found the URL:", a['href']
将输出:
Found the URL: some_url
Found the URL: another_url
请注意,如果你使用BeautifulSoup的旧版本(版本4之前)这种方法的名称是的findAll
。在第4版,BeautifulSoup的方法名被改为PEP 8兼容,所以你应该使用 find_all
代替。
Note that if you're using an older version of BeautifulSoup (before version 4) the name of this method is findAll
. In version 4, BeautifulSoup's method names were changed to be PEP 8 compliant, so you should use find_all
instead.
如果你想的所有的带标签的的href
,则可以省略名称
参数:
If you want all tags with an href
, you can omit the name
parameter:
href_tags = soup.find_all(href=True)
这篇关于BeautifulSoup越来越HREF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!