带有多个名称空间的python lxml findall [英] python lxml findall with multiple namespaces
问题描述
我正在尝试使用lxml解析具有多个名称空间的XML文档,而我一直坚持使用findall()方法返回某些内容.
I'm trying to parse an XML document with multiple namespaces with lxml, and I'm stuck on getting the findall() method to return something.
我的XML:
<MeasurementRecords xmlns="http://www.company.com/common/rsp/2012/07"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.company.com/common/rsp/2012/07 RSP_EWS_V1.6.xsd">
<HistoryRecords>
<ValueItemId>100_0000100004_3788_Resource-0.customId_WSx Data Precip Type</ValueItemId>
<List>
<HistoryRecord>
<Value>60</Value>
<State>Valid</State>
<TimeStamp>2016-04-20T12:40:00Z</TimeStamp>
</HistoryRecord>
</List>
</HistoryRecords>
<HistoryRecords>
</MeasurementRecords>
我的代码:
from lxml import etree
from pprint import pprint
RSPxmlFile = '/home/user/Desktop/100_0000100004_3788_20160420144011263_records.xml'
with open (RSPxmlFile, 'rt') as f:
tree = etree.parse(f)
root = tree.getroot()
for node in tree.findall('MeasurementRecords', root.nsmap):
print node
print "parameter = ", node.text
赠予:
ValueError: empty namespace prefix is not supported in ElementPath
在阅读此内容后,我尝试了一些实验:
>>> root.nsmap
{'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: http://www.company.com/common/rsp/2012/07'}
>>> nsmap['foo']=nsmap[None]
>>> nsmap.pop(None)
'http://www.company.com/common/rsp/2012/07'
>>> nsmap
{'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 'foo': 'http://www.company.com/common/rsp/2012/07'}
>>> tree.xpath("//MeasurementRecords", namespaces=nsmap)
[]
>>> tree.xpath('/foo:MeasurementRecords', namespaces=nsmap)
[<Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290>]
>>> tree.xpath('/foo:MeasurementRecords/HistoryRecords', namespaces=nsmap)
[]
但这似乎无济于事.
因此,更多实验:
>>> tree.findall('//{http://www.company.com/common/rsp/2012/07}MeasurementRecords')
[]
>>> print root
<Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290>
>>> print tree
<lxml.etree._ElementTree object at 0x6ffffda5368>
>>> for node in tree.iter():
... print node
...
<Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290>
<Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x6ffffda5cf8>
<Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x6ffffda5f38>
...etc...
>>> tree.findall("//HistoryRecords", namespaces=nsmap)
[]
>>> tree.findall("//foo:MeasurementRecords/HistoryRecords", namespaces=nsmap)
[]
我很困惑.我不知道怎么了.
I'm stumped. I have no idea what's wrong.
推荐答案
如果从此开始:
>>> tree = etree.parse(open('data.xml'))
>>> root = tree.getroot()
>>>
这将找不到任何元素...
This will fail to find any elements...
>>> root.findall('{http://www.company.com/common/rsp/2012/07}MeasurementRecords')
[]
...但是那是因为root
是一个MeasurementRecords
元素;它
不包含任何MeasurementRecords
元素.在另一
的手,下面的工作就好了:
...but that's because root
is a MeasurementRecords
element; it
does not contain any MeasurementRecords
elements. On the other
hand, the following works just fine:
>>> root.findall('{http://www.company.com/common/rsp/2012/07}HistoryRecords')
[<Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x7fccd0332ef0>]
>>>
使用xpath
方法,您可以执行以下操作:
Using the xpath
method, you could do something like this:
>>> nsmap={'a': 'http://www.company.com/common/rsp/2012/07',
... 'b': 'http://www.w3.org/2001/XMLSchema-instance'}
>>> root.xpath('//a:HistoryRecords', namespaces=nsmap)
[<Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x7fccd0332ef0>]
所以:
-
findall
和find
方法需要{...namespace...}ElementName
语法. -
xpath
方法需要名称空间前缀(ns:ElementName
),它在提供的namespaces
映射中查找. 前缀不必与原始文档中使用的前缀匹配,但是命名空间url 必须匹配.
- The
findall
andfind
methods require{...namespace...}ElementName
syntax. - The
xpath
method requires namespace prefixes (ns:ElementName
), which it looks up in the providednamespaces
map. The prefix doesn't have to match the prefix used in the original document, but the namespace url must match.
这可行:
>>> root.find('{http://www.company.com/common/rsp/2012/07}HistoryRecords/{http://www.company.com/common/rsp/2012/07}ValueItemId')
<Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x7fccd0332a70>
这可行:
>>> root.xpath('/a:MeasurementRecords/a:HistoryRecords/a:ValueItemId',namespaces=nsmap)
[<Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x7fccd0330830>]
这篇关于带有多个名称空间的python lxml findall的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!