使用美丽汤按类名获取内容 [英] Get contents by class names using Beautiful Soup
本文介绍了使用美丽汤按类名获取内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用Beautiful Soup模块,如何获取类名称为feeditemcontent cxfeeditemcontent
的div
标记的数据?是吗?
Using Beautiful Soup module, how can I get data of a div
tag whose class name is feeditemcontent cxfeeditemcontent
? Is it:
soup.class['feeditemcontent cxfeeditemcontent']
或:
soup.find_all('class')
这是HTML来源:
<div class="feeditemcontent cxfeeditemcontent">
<div class="feeditembodyandfooter">
<div class="feeditembody">
<span>The actual data is some where here</span>
</div>
</div>
</div>
这是Python代码:
and this is the Python code:
from BeautifulSoup import BeautifulSoup
html_doc = open('home.jsp.html', 'r')
soup = BeautifulSoup(html_doc)
class="feeditemcontent cxfeeditemcontent"
推荐答案
尝试一下,也许对于这个简单的东西来说太多了,但它可以起作用:
Try this, maybe it's too much for this simple thing but it works:
def match_class(target):
target = target.split()
def do_match(tag):
try:
classes = dict(tag.attrs)["class"]
except KeyError:
classes = ""
classes = classes.split()
return all(c in classes for c in target)
return do_match
html = """<div class="feeditemcontent cxfeeditemcontent">
<div class="feeditembodyandfooter">
<div class="feeditembody">
<span>The actual data is some where here</span>
</div>
</div>
</div>"""
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
matches = soup.findAll(match_class("feeditemcontent cxfeeditemcontent"))
for m in matches:
print m
print "-"*10
matches = soup.findAll(match_class("feeditembody"))
for m in matches:
print m
print "-"*10
这篇关于使用美丽汤按类名获取内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文