BeautifulSoup 获取 href [英] BeautifulSoup getting href

查看:26
本文介绍了BeautifulSoup 获取 href的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下:

<a href="some_url">next</a>
<span class="class">...</span>

我想从中提取href,"some_url"

From this I want to extract the href, "some_url"

如果我只有一个标签,我可以做到,但这里有两个标签.我也可以得到文本 'next' 但这不是我想要的.

I can do it if I only have one tag, but here there are two tags. I can also get the text 'next' but that's not what I want.

另外,在某处是否有很好的 API 描述和示例.我正在使用标准文档,但我正在寻找一些东西更有条理.

Also, is there a good description of the API somewhere with examples. I'm using the standard documentation, but I'm looking for something a little more organized.

推荐答案

您可以通过以下方式使用 find_all 来查找每个具有 a 元素href 属性,并打印每个:

You can use find_all in the following way to find every a element that has an href attribute, and print each one:

from BeautifulSoup import BeautifulSoup

html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''

soup = BeautifulSoup(html)

for a in soup.find_all('a', href=True):
    print "Found the URL:", a['href']

输出将是:

Found the URL: some_url
Found the URL: another_url

请注意,如果您使用的是旧版本的 BeautifulSoup(版本 4 之前),则此方法的名称为 findAll.在第 4 版中,BeautifulSoup 的方法名称已更改为符合 PEP 8,因此您应该使用 find_all 代替.

Note that if you're using an older version of BeautifulSoup (before version 4) the name of this method is findAll. In version 4, BeautifulSoup's method names were changed to be PEP 8 compliant, so you should use find_all instead.

如果你希望所有标签带有href,你可以省略name参数:

If you want all tags with an href, you can omit the name parameter:

href_tags = soup.find_all(href=True)

这篇关于BeautifulSoup 获取 href的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆