LXML-排序标签顺序 [英] LXML - Sorting Tag Order

查看:156
本文介绍了LXML-排序标签顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个旧文件格式,正在将其转换为XML进行处理.结构可以总结为:

I have a legacy file format which I'm converting into XML for processing. The structure can be summarised as:

<A>
    <A01>X</A01>
    <A02>Y</A02>
    <A03>Z</A03>
</A>

标签的数字部分可以从01到99,并且可能存在间隙.作为处理的一部分,某些记录可能添加了附加标签.处理完成后,我通过遍历树将文件转换回旧格式.文件很大(约150,000个节点).

The numerical part of the tags can go from 01 to 99 and there may be gaps. As part of the processing certain records may have additional tags added. After the processing is completed I'm converting the file back to the legacy format by iterwalking the tree. The files are reasonably large (~150,000 nodes).

一个问题是,某些使用旧格式的软件假定标签(或转换时的字段)将按字母数字顺序排列,但是默认情况下,新标签会添加到分支,然后导致它们以错误的顺序退出迭代器.

A problem with this is that some software which uses the legacy format assumes that the tags (or rather fields by the time it's converted) will be in alpha-numeric order but by default new tags will be added to the end of the branch which then causes them to come out of the iterator in the wrong order.

每次添加新标签时,我都可以使用xpath根据标签名称查找之前的同级标签,但是我的问题是,在导出之前是否有一种更简单的方法可以立即对树进行排序?

I can use xpath to find the preceeding sibling based on tag name each time I come to add a new tag but my question is whether there's a simpler way to sort the tree at once just prior to export?

我认为我已经对结构进行了总结.

I think I've over summarised the structure.

一条记录可以包含如上所述的多个级别,以提供类似以下内容的信息:

A record can contain several levels as described above to give something like:

<X>
    <X01>1</X01>
    <X02>2</X02>
    <X03>3</X03>
    <A>
        <A01>X</A01>
        <A02>Y</A02>
        <A03>Z</A03>
    </A>
    <B>
        <B01>Z</B02>
        <B02>X</B02>
        <B03>C</B03>
    </B>
</X>

推荐答案

可以编写一个辅助函数以在正确的位置插入新元素,但是如果不了解其结构,很难使其通用.

It's possible to write a helper function to insert a new element in the correct place, but without knowing more about the structure it's difficult to make it generic.

这是在整个文档中对子元素进行排序的简短示例:

Here's a short example of sorting child elements across the whole document:

from lxml import etree

data = """<X>
    <X03>3</X03>
    <X02>2</X02>
    <A>
        <A02>Y</A02>
        <A01>X</A01>
        <A03>Z</A03>
    </A>
    <X01>1</X01>
    <B>
        <B01>Z</B01>
        <B02>X</B02>
        <B03>C</B03>
    </B>
</X>"""

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

for parent in doc.xpath('//*[./*]'): # Search for parent elements
  parent[:] = sorted(parent,key=lambda x: x.tag)

print etree.tostring(doc,pretty_print=True)

屈服:

<X>
  <A>
    <A01>X</A01>
    <A02>Y</A02>
    <A03>Z</A03>
  </A>
  <B>
    <B01>Z</B01>
    <B02>X</B02>
    <B03>C</B03>
  </B>
  <X01>1</X01>
  <X02>2</X02>
  <X03>3</X03>
</X>

这篇关于LXML-排序标签顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆