BeautifulSoup:从HTML获取CSS类 [英] BeautifulSoup: get css classes from html
问题描述
有没有办法从使用BeautifulSoup一个HTML文件中获取CSS类?示例代码片段:
<风格类型=文/ CSS> p.c3 {文本对齐:证明} p.c2的{text-align:left进行} p.c1 {文本对齐:中心}< /风格>
完美的输出将是:
cssdict = {
p.c3:{文本对齐':'证明'},
p.c2:{文本对齐:'左'},
p.c1:{文本对齐':'中心'}
}
虽然这样的事情会做:
L = [
(p.c3',{'的text-align':'证明'}),
(p.c2',{'的text-align':'左'})
(p.c'1,{'的text-align':'中心'})
]
BeautifulSoup本身并不解析CSS样式声明可言,但你的可以的提取物等部分,然后用专用的CSS解析器解析它们
根据您的需要,有可用于蟒蛇几个CSS语法分析器;我会选择 cssutils (需要Python 2.5或向上(包括Python 3)),它是在它支持的最完整,并且支持内嵌样式了。
其他选项是 CSS-PY 和的 tinycss 。
要抓住和解析这样的大家风范部分(例如使用cssutils):
进口cssutils
片= []
在tree.findAll('风格',类型='文字/ CSS)styletag
如果没有styletag.string:#可能是一个外部表
继续
sheets.append(cssutils.parseStyle(styletag.string))
使用 cssutil
,那么你可以结合这些,解决进口,甚至有它获取外部样式表。
Is there a way to get CSS classes from a HTML file using BeautifulSoup? Example snippet:
<style type="text/css">
p.c3 {text-align: justify}
p.c2 {text-align: left}
p.c1 {text-align: center}
</style>
Perfect output would be:
cssdict = {
'p.c3': {'text-align':'justify'},
'p.c2': {'text-align:'left'},
'p.c1':{'text-align':'center'}
}
although something like this would do:
L = [
('p.c3', {'text-align': 'justify'}),
('p.c2', {'text-align': 'left'}),
('p.c'1, {'text-align': 'center'})
]
BeautifulSoup itself doesn't parse CSS style declarations at all, but you can extract such sections then parse them with a dedicated CSS parser.
Depending on your needs, there are several CSS parsers available for python; I'd pick cssutils (requires python 2.5 or up (including python 3)), it is the most complete in it's support, and supports inline styles too.
Other options are css-py and tinycss.
To grab and parse such all style sections (example with cssutils):
import cssutils
sheets = []
for styletag in tree.findAll('style', type='text/css')
if not styletag.string: # probably an external sheet
continue
sheets.append(cssutils.parseStyle(styletag.string))
With cssutil
you can then combine these, resolve imports, and even have it fetch external stylesheets.
这篇关于BeautifulSoup:从HTML获取CSS类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!