beautifulsoup:分析跨度标题 [英] beautifulsoup: Parse Span Title
本文介绍了beautifulsoup:分析跨度标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我试图解析HTML页面,我已经成功了的HTML DOM树的子区域,但我被困在那里有span标记的地方。
I am trying to parse a html page, I have successfully got to the sub area of the tree of the html dom but I am stuck in a place where there are span tags.
例如:我最初解析页面如下:
example: I initially parse the page as follows:
user_url = base_url + str(user_id) + "/" + display_name
user_page = urllib2.urlopen(user_url)
souping_page = bs(user_page)
badges = souping_page.body.find('div', attrs={'class': 'badges'})
徽章给我以下内容:
badges will give me following:
<span><span title="3 gold badges"><span class="badge1"></span><span class="badgecount">3</span></span><span title="23 silver badges"><span class="badge2"></span><span class="badgecount">23</span></span><span title="43 bronze badges"><span class="badge3"></span><span class="badgecount">43</span></span></span>
但我试图提取&LT;跨度标题=3金徽章&GT;
和所有其他跨度标题
通过遍历DOM结构属性。我该怎么做,在beautifulsoup。
But I am trying to extract <span title="3 gold badges">
and all the other span title
attributes by traversing the dom structure. How can I do that in beautifulsoup.
推荐答案
您可以简单地做到这一点:
You can simply do this:
>>> badges.span.span
<span title="3 gold badges"><span class="badge1"></span><span class="badgecount">3</span></span>
这篇关于beautifulsoup:分析跨度标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文