使用Python/lxml和XPath检索属性名称和值 [英] Retrieve attribute names and values with Python / lxml and XPath

查看:44
本文介绍了使用Python/lxml和XPath检索属性名称和值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将XPath与Python lxml(Python 2)一起使用.我对数据进行了两次遍历,一次遍历选择了感兴趣的记录,一次遍历从数据中提取值.这是代码类型的示例.

I am using XPath with Python lxml (Python 2). I run through two passes on the data, one to select the records of interest, and one to extract values from the data. Here is a sample of the type of code.

from lxml import etree

xml = """
  <records>
    <row id="1" height="160" weight="80" />
    <row id="2" weight="70" />
    <row id="3" height="140" />
  </records>
"""

parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
    print node.xpath("@id|@height|@weight")

运行此脚本时,输出为:

When I run this script the output is:

['1', '160', '80']
['2', '70']
['3', '140']

从结果中可以看到,缺少某个属性的地方,其他属性的位置发生了变化,因此我无法在第2行和第3行中分辨出这是身高还是体重.

As you can see from the result, where an attribute is missing, the position of the other attributes changes, so I cannot tell in row 2 and 3 whether this is the height or the weight.

是否有办法获取从etree/lxml返回的属性的名称?理想情况下,我应该以以下格式查看结果:

Is there a way to get the names of the attributes returned from etree/lxml? Ideally, I should be looking at a result in the format:

[('@id', '1'), ('@height', '160'), ('@weight', '80')]

我认识到我可以使用elementtree和Python解决此特定情况.但是,我希望使用XPath(以及相对简单的XPath)解决此问题,而不是使用python处理数据.

I recognise that I can solve this specific case using elementtree and Python. However, I wish to resolve this using XPaths (and relatively simple XPaths), rather than process the data using python.

推荐答案

我断言我不打算使用Python是错误的.我发现lxml/etree实现很容易扩展为可以修改使用XPath DSL.

I was wrong in my assertion that I was not going to use Python. I found that the lxml/etree implementation is easily extended to that I can use the XPath DSL with modifications.

我注册了功能"dictify".我将XPath表达式更改为:

I registered the function "dictify". I changed the XPath expression to :

dictify('@id|@height|@weight|weight|height')

新代码为:

from lxml import etree

xml = """
<records>
    <row id="1" height="160" weight="80" />
    <row id="2" weight="70" ><height>150</height></row>
    <row id="3" height="140" />
</records>
"""

def dictify(context, names):
    node = context.context_node
    rv = []
    rv.append('__dictify_start_marker__')
    names = names.split('|')
    for n in names:
        if n.startswith('@'):
            val =  node.attrib.get(n[1:])
            if val != None:
                rv.append(n)
                rv.append(val)
        else:
            children = node.findall(n)
            for child_node in children:
                rv.append(n)
                rv.append(child_node.text)
    rv.append('__dictify_end_marker__')
    return rv

etree_functions = etree.FunctionNamespace(None)
etree_functions['dictify'] = dictify


parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
    print node.xpath("dictify('@id|@height|@weight|weight|height')")

这会产生以下输出:

['__dictify_start_marker__', '@id', '1', '@height', '160', '@weight', '80', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '2', '@weight', '70', 'height', '150', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '3', '@height', '140', '__dictify_end_marker__']

这篇关于使用Python/lxml和XPath检索属性名称和值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆