BeautifulSoup:从HTML获取CSS类 [英] BeautifulSoup: get css classes from html

查看:243
本文介绍了BeautifulSoup:从HTML获取CSS类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法从使用BeautifulSoup一个HTML文件中获取CSS类?示例代码片段:

 <风格类型=文/ CSS> p.c3 {文本对齐:证明} p.c2的{text-align:left进行} p.c1 {文本对齐:中心}< /风格>

完美的输出将是:

  cssdict = {
    p.c3:{文本对齐':'证明'},
    p.c2:{文本对齐:'左'},
    p.c1:{文本对齐':'中心'}
}

虽然这样的事情会做:

  L = [
    (p.c3',{'的text-align':'证明'}),
    (p.c2',{'的text-align':'左'})
    (p.c'1,{'的text-align':'中心'})
]


解决方案

BeautifulSoup本身并不解析CSS样式声明可言,但你的可以的提取物等部分,然后用专用的CSS解析器解析它们

根据您的需要,有可用于蟒蛇几个CSS语法分析器;我会选择 cssutils (需要Python 2.5或向上(包括Python 3)),它是在它支持的最完整,并且支持内嵌样式了。

其他选项是 CSS-PY 和的 tinycss

要抓住和解析这样的大家风范部分(例如使用cssutils):

 进口cssutils
片= []
在tree.findAll('风格',类型='文字/ CSS)styletag
    如果没有styletag.string:#可能是一个外部表
        继续
    sheets.append(cssutils.parseStyle(styletag.string))

使用 cssutil ,那么你可以结合这些,解决进口,甚至有它获取外部样式表。

Is there a way to get CSS classes from a HTML file using BeautifulSoup? Example snippet:

<style type="text/css">

 p.c3 {text-align: justify}

 p.c2 {text-align: left}

 p.c1 {text-align: center}

</style>

Perfect output would be:

cssdict = {
    'p.c3': {'text-align':'justify'},
    'p.c2': {'text-align:'left'},
    'p.c1':{'text-align':'center'}
}

although something like this would do:

L = [
    ('p.c3', {'text-align': 'justify'}),  
    ('p.c2', {'text-align': 'left'}),    
    ('p.c'1, {'text-align': 'center'})
]

解决方案

BeautifulSoup itself doesn't parse CSS style declarations at all, but you can extract such sections then parse them with a dedicated CSS parser.

Depending on your needs, there are several CSS parsers available for python; I'd pick cssutils (requires python 2.5 or up (including python 3)), it is the most complete in it's support, and supports inline styles too.

Other options are css-py and tinycss.

To grab and parse such all style sections (example with cssutils):

import cssutils
sheets = []
for styletag in tree.findAll('style', type='text/css')
    if not styletag.string: # probably an external sheet
        continue
    sheets.append(cssutils.parseStyle(styletag.string))

With cssutil you can then combine these, resolve imports, and even have it fetch external stylesheets.

这篇关于BeautifulSoup:从HTML获取CSS类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆