如何使用python解析包含CSS和HTML的网页 [英] How to parse a web page containing CSS and HTML using python
问题描述
我试图从包含CSS以及HTML的网页上解析和提取一些信息.我为此使用cssutils和beatifulsoup.可以说我想找出用于表标题的字体大小. Beautifulsoup告诉我表定义在HTML中的位置.但是,如果我想知道表中使用了哪种样式,是否可以从BeatifulSoup获得该信息?如果没有,我该如何解决这个问题.谢谢你的帮助.
Am trying to parse and extract some information from a web page that contains CSS and of course HTML. I am using cssutils and beatifulsoup for this. Lets say I want to find out the font size used for a table heading. Beautifulsoup tells me where the table definition is in HTML. But if I want to know which style is used in the table do I get that information from BeatifulSoup? If not how do I go about solving this problem. Thanks for any help.
推荐答案
是的. BeautifulSoup 是完美的选择,并且使用正则表达式具有强大的功能:)
Yes you get it. BeautifulSoup is perfect the choice and with regular expression is strong power :)
示例:
import re
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<h1 style="font-size: 12px; margin: 5px">Test</h>')
style = soup.find('h1')['style']
re.findall('font-size[^;]+', style)
# [u'font-size: 12px']
这篇关于如何使用python解析包含CSS和HTML的网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!