在 Python 中处理 XML 的方法真的很简单吗? [英] Really simple way to deal with XML in Python?
问题描述
考虑到一个最近提出的问题,我开始怀疑是否有一个非常简单的在 Python 中处理 XML 文档的方法.如果您愿意,可以使用 Pythonic 方式.
也许我可以最好地解释一下,如果我举个例子:让我们说以下 - 我认为这是一个很好的例子,说明 XML 在网络服务中是如何(错误)使用的 - 是我从 http 请求到 http://www.google.com/ig/api?weather=94043
<weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" ><预测信息><city data="Mountain View, CA"/><postal_code data="94043"/><latitude_e6 data=""/><longitude_e6 data=""/><forecast_date data="2010-06-23"/><current_date_time data="2010-06-24 00:02:54 +0000"/><unit_system data="US"/></forecast_information><当前条件><条件数据=晴天"/><temp_f data="68"/><temp_c data="20"/><湿度数据="湿度:61%"/><icon data="/ig/images/weather/sunny.gif"/><wind_condition data="Wind: NW at 19 mph"/></current_conditions>...<forecast_conditions><day_of_week 数据="周六"/><低数据=59"/><高数据="75"/><icon data="/ig/images/weather/partly_cloudy.gif"/><条件数据="部分多云"/></forecast_conditions></天气></xml_api_reply>
加载/解析此类文档后,我希望能够像说一样简单地访问信息
<预><代码>>>>xml['xml_api_reply']['weather']['forecast_information']['city'].data'山景城,加利福尼亚'或
<预><代码>>>>xml.xml_api_reply.weather.current_conditions.temp_f['数据']'68'从我目前看到的情况来看,ElementTree
似乎最接近我的梦想.但它不存在,在使用 XML 时仍有一些摸索要做.OTOH,我想的并不复杂——可能只是解析器顶部的薄饰面——但它可以减少处理 XML 的烦恼.有这么神奇的吗?(如果不是 - 为什么?)
附注.注意我已经尝试过 BeautifulSoup
,虽然我喜欢它的方法,但它在 <element/>
s 空的情况下存在真正的问题 - 请参见下面的评论中的示例.>
您想要薄贴面?这很容易煮.首先尝试以下围绕 ElementTree 的简单包装器:
#geetree.py导入 xml.etree.ElementTree 作为 ET类 GeeElem(对象):"""包装在 ElementTree 元素周围.a['foo'] 获取属性 foo, a.foo 获取第一个子元素 foo."""def __init__(self, elem):self.etElem = elemdef __getitem__(self, name):res = self._getattr(name)如果 res 是 None:引发 AttributeError,没有名为 '%s' 的属性"% name返回资源def __getattr__(self, name):res = self._getelem(name)如果 res 是 None:引发 IndexError,没有名为 '%s' 的元素"% name返回资源def _getelem(self, name):res = self.etElem.find(name)如果 res 是 None:返回无返回 GeeElem(res)def _getattr(self, name):返回 self.etElem.get(name)类 GeeTree(对象):围绕 ElementTree 进行包装."def __init__(self, fname):self.doc = ET.parse(fname)def __getattr__(self, name):如果 self.doc.getroot().tag != name:引发 IndexError,没有名为 '%s' 的元素"% name返回 GeeElem(self.doc.getroot())def getroot(self):返回 self.doc.getroot()
你这样调用它:
<预><代码>>>>进口吉利>>>t = geetree.GeeTree('foo.xml')>>>t.xml_api_reply.weather.forecast_information.city['数据']'山景城,加利福尼亚'>>>t.xml_api_reply.weather.current_conditions.temp_f['data']'68'Musing over a recently asked question, I started to wonder if there is a really simple way to deal with XML documents in Python. A pythonic way, if you will.
Perhaps I can explain best if i give example: let's say the following - which i think is a good example of how XML is (mis)used in web services - is the response i get from http request to http://www.google.com/ig/api?weather=94043
<xml_api_reply version="1">
<weather module_id="0" tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0" >
<forecast_information>
<city data="Mountain View, CA"/>
<postal_code data="94043"/>
<latitude_e6 data=""/>
<longitude_e6 data=""/>
<forecast_date data="2010-06-23"/>
<current_date_time data="2010-06-24 00:02:54 +0000"/>
<unit_system data="US"/>
</forecast_information>
<current_conditions>
<condition data="Sunny"/>
<temp_f data="68"/>
<temp_c data="20"/>
<humidity data="Humidity: 61%"/>
<icon data="/ig/images/weather/sunny.gif"/>
<wind_condition data="Wind: NW at 19 mph"/>
</current_conditions>
...
<forecast_conditions>
<day_of_week data="Sat"/>
<low data="59"/>
<high data="75"/>
<icon data="/ig/images/weather/partly_cloudy.gif"/>
<condition data="Partly Cloudy"/>
</forecast_conditions>
</weather>
</xml_api_reply>
After loading/parsing such document, i would like to be able to access the information as simple as say
>>> xml['xml_api_reply']['weather']['forecast_information']['city'].data
'Mountain View, CA'
or
>>> xml.xml_api_reply.weather.current_conditions.temp_f['data']
'68'
From what I saw so far, seems that ElementTree
is the closest to what I dream of. But it's not there, there is still some fumbling to do when consuming XML. OTOH, what I am thinking is not that complicated - probably just thin veneer on top of a parser - and yet it can decrease annoyance of dealing with XML. Is there such a magic? (And if not - why?)
PS. Note I have tried BeautifulSoup
already and while I like its approach, it has real issues with empty <element/>
s - see below in comments for examples.
You want a thin veneer? That's easy to cook up. Try the following trivial wrapper around ElementTree as a start:
# geetree.py
import xml.etree.ElementTree as ET
class GeeElem(object):
"""Wrapper around an ElementTree element. a['foo'] gets the
attribute foo, a.foo gets the first subelement foo."""
def __init__(self, elem):
self.etElem = elem
def __getitem__(self, name):
res = self._getattr(name)
if res is None:
raise AttributeError, "No attribute named '%s'" % name
return res
def __getattr__(self, name):
res = self._getelem(name)
if res is None:
raise IndexError, "No element named '%s'" % name
return res
def _getelem(self, name):
res = self.etElem.find(name)
if res is None:
return None
return GeeElem(res)
def _getattr(self, name):
return self.etElem.get(name)
class GeeTree(object):
"Wrapper around an ElementTree."
def __init__(self, fname):
self.doc = ET.parse(fname)
def __getattr__(self, name):
if self.doc.getroot().tag != name:
raise IndexError, "No element named '%s'" % name
return GeeElem(self.doc.getroot())
def getroot(self):
return self.doc.getroot()
You invoke it so:
>>> import geetree
>>> t = geetree.GeeTree('foo.xml')
>>> t.xml_api_reply.weather.forecast_information.city['data']
'Mountain View, CA'
>>> t.xml_api_reply.weather.current_conditions.temp_f['data']
'68'
这篇关于在 Python 中处理 XML 的方法真的很简单吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!