beautifulsoup:分析跨度标题 [英] beautifulsoup: Parse Span Title

查看:177
本文介绍了beautifulsoup:分析跨度标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析HTML页面,我已经成功了的HTML DOM树的子区域,但我被困在那里有span标记的地方。

I am trying to parse a html page, I have successfully got to the sub area of the tree of the html dom but I am stuck in a place where there are span tags.

例如:我最初解析页面如下:

example: I initially parse the page as follows:

        user_url = base_url + str(user_id) + "/" + display_name
        user_page = urllib2.urlopen(user_url)
        souping_page = bs(user_page)
        badges = souping_page.body.find('div', attrs={'class': 'badges'})

徽章给我以下内容:

badges will give me following:

<span><span title="3 gold badges"><span class="badge1"></span><span class="badgecount">3</span></span><span title="23 silver badges"><span class="badge2"></span><span class="badgecount">23</span></span><span title="43 bronze badges"><span class="badge3"></span><span class="badgecount">43</span></span></span>

但我试图提取&LT;跨度标题=3金徽章&GT; 和所有其他跨度标题通过遍历DOM结构属性。我该怎么做,在beautifulsoup。

But I am trying to extract <span title="3 gold badges"> and all the other span title attributes by traversing the dom structure. How can I do that in beautifulsoup.

推荐答案

您可以简单地做到这一点:

You can simply do this:

>>> badges.span.span
<span title="3 gold badges"><span class="badge1"></span><span class="badgecount">3</span></span>

这篇关于beautifulsoup:分析跨度标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆