使用Python/lxml和XPath检索属性名称和值 [英] Retrieve attribute names and values with Python / lxml and XPath
问题描述
我正在将XPath与Python lxml(Python 2)一起使用.我对数据进行了两次遍历,一次遍历选择了感兴趣的记录,一次遍历从数据中提取值.这是代码类型的示例.
I am using XPath with Python lxml (Python 2). I run through two passes on the data, one to select the records of interest, and one to extract values from the data. Here is a sample of the type of code.
from lxml import etree
xml = """
<records>
<row id="1" height="160" weight="80" />
<row id="2" weight="70" />
<row id="3" height="140" />
</records>
"""
parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
print node.xpath("@id|@height|@weight")
运行此脚本时,输出为:
When I run this script the output is:
['1', '160', '80']
['2', '70']
['3', '140']
从结果中可以看到,缺少某个属性的地方,其他属性的位置发生了变化,因此我无法在第2行和第3行中分辨出这是身高还是体重.
As you can see from the result, where an attribute is missing, the position of the other attributes changes, so I cannot tell in row 2 and 3 whether this is the height or the weight.
是否有办法获取从etree/lxml返回的属性的名称?理想情况下,我应该以以下格式查看结果:
Is there a way to get the names of the attributes returned from etree/lxml? Ideally, I should be looking at a result in the format:
[('@id', '1'), ('@height', '160'), ('@weight', '80')]
我认识到我可以使用elementtree和Python解决此特定情况.但是,我希望使用XPath(以及相对简单的XPath)解决此问题,而不是使用python处理数据.
I recognise that I can solve this specific case using elementtree and Python. However, I wish to resolve this using XPaths (and relatively simple XPaths), rather than process the data using python.
推荐答案
我断言我不打算使用Python是错误的.我发现lxml/etree实现很容易扩展为可以修改使用XPath DSL.
I was wrong in my assertion that I was not going to use Python. I found that the lxml/etree implementation is easily extended to that I can use the XPath DSL with modifications.
我注册了功能"dictify".我将XPath表达式更改为:
I registered the function "dictify". I changed the XPath expression to :
dictify('@id|@height|@weight|weight|height')
新代码为:
from lxml import etree
xml = """
<records>
<row id="1" height="160" weight="80" />
<row id="2" weight="70" ><height>150</height></row>
<row id="3" height="140" />
</records>
"""
def dictify(context, names):
node = context.context_node
rv = []
rv.append('__dictify_start_marker__')
names = names.split('|')
for n in names:
if n.startswith('@'):
val = node.attrib.get(n[1:])
if val != None:
rv.append(n)
rv.append(val)
else:
children = node.findall(n)
for child_node in children:
rv.append(n)
rv.append(child_node.text)
rv.append('__dictify_end_marker__')
return rv
etree_functions = etree.FunctionNamespace(None)
etree_functions['dictify'] = dictify
parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
print node.xpath("dictify('@id|@height|@weight|weight|height')")
这会产生以下输出:
['__dictify_start_marker__', '@id', '1', '@height', '160', '@weight', '80', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '2', '@weight', '70', 'height', '150', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '3', '@height', '140', '__dictify_end_marker__']
这篇关于使用Python/lxml和XPath检索属性名称和值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!