使用iterparse编辑和输出xml的python lxml [英] python lxml using iterparse to edit and output xml

查看:148
本文介绍了使用iterparse编辑和输出xml的python lxml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在弄乱lxml库一段时间,也许我没有正确理解它,或者我缺少了一些东西,但是我似乎找不到一个如何编辑文件的方法.某些xpath,然后在逐个元素解析时能够将其写回到xml中.

I've been messing around with the lxml library for a little while and maybe I'm not understanding it correctly or I'm missing something but I can't seem to figure out how to edit the file after I catch a certain xpath and then be able to write that back out into xml while I'm parsing element by element.

假设我们以这个xml为例:

Say we have this xml as an example:

<xml>
   <items>
      <pie>cherry</pie>
      <pie>apple</pie>
      <pie>chocolate</pie>
  </items>
</xml>

解析时我想做的是,当我碰到"/xml/items/pie"的xpath是在pie之前添加一个元素,因此结果如下:

What I would like to do while parsing is when I hit that xpath of "/xml/items/pie" is to add an element before pie, so it will turn out like this:

<xml>
   <items>
      <item id="1"><pie>cherry</pie></item>
      <item id="2"><pie>apple</pie></item>
      <item id="3"><pie>chocolate</pie></item>
  </items>
</xml>

当我点击每个标签并在某些xpaths编辑xml时,需要逐行写入文件来完成输出.我的意思是我可以先对某些部分进行硬编码,然后再打印开始标签,文本,属性(如果存在),然后打印结束标签,但这很麻烦,如果有一种方法可以避免这种情况,那就很麻烦了可能.

That output would need to be done by writing to a file line by line as I hit each tag and edit the xml at certain xpaths. I mean I could have it print the starting tag, the text, the attribute if it exists, and then the ending tag by hard coding certain parts, but that would be very messy and it be nice if there was a way to avoid that if possible.

这是我的猜测代码:

from lxml import etree

path=[]
count=0

context=etree.iterparse(file,events=('start','end'))
for event, element in context:
    if event=='start':
       path.append(element.tag)
       if /'+'/'.join(path)=='/xml/items/pie':
          itemnode=etree.Element('item',id=str(count))
          itemnode.text=""
          element.addprevious(itemnode)#Not the right way to do it of course
          #write/print out xml here.
    else:
        element.clear()
        path.pop()

另外,我需要运行相当大的文件,因此必须使用iterparse.

Also, I need to run through fairly big files, so I have to use iterparse.

推荐答案

以下是使用iterparse()的解决方案.想法是捕获所有标签开始"事件,记住父标签(items),然后为每个pie标签创建一个item标签并将饼图放入其中:

Here's a solution using iterparse(). The idea is to catch all tag "start" events, remember the parent (items) tag, then for every pie tag create an item tag and put the pie into it:

from StringIO import StringIO
from lxml import etree
from lxml.etree import Element

data = """<xml>
   <items>
      <pie>cherry</pie>
      <pie>apple</pie>
      <pie>chocolate</pie>
  </items>
</xml>"""

stream = StringIO(data)
context = etree.iterparse(stream, events=("start", ))

for action, elem in context:
    if elem.tag == 'items':
        items = elem
        index = 1
    elif elem.tag == 'pie':
        item = Element('item', {'id': str(index)})
        items.replace(elem, item)
        item.append(elem)
        index += 1

print etree.tostring(context.root)

打印:

<xml>
   <items>
      <item id="1"><pie>cherry</pie></item>
      <item id="2"><pie>apple</pie></item>
      <item id="3"><pie>chocolate</pie></item>
   </items>
</xml>

这篇关于使用iterparse编辑和输出xml的python lxml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆