如何告诉lxml.etree.tostring(element)不要在python中编写名称空间? [英] How to tell lxml.etree.tostring(element) not to write namespaces in python?

查看:364
本文介绍了如何告诉lxml.etree.tostring(element)不要在python中编写名称空间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的xml文件(1 Gig).我想将某些元素(条目)移动到具有相同标题和规范的另一个文件中.

I have a huge xml file (1 Gig). I want to move some of the elements (entrys) to another file with the same header and specifications.

假设原始文件包含标签为<to_move>的条目:

Let's say the original file contains this entry with tag <to_move>:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE some SYSTEM "some.dtd">
<some>
...
<to_move date="somedate">
    <child>some text</child>
    ...
...
</to_move>
...
</some>

我使用lxml.etree.iterparse遍历文件.工作正常.当我找到带有标签<to_move>的元素时,我们假设它存储在我做的变量element

I use lxml.etree.iterparse to iterate through the file. Works fine. When I find the element with tag <to_move>, let's assume it is stored in the variable element I do

new_file.write(etree.tostring(element))

但这会导致

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE some SYSTEM "some.dtd">
<some>
...
<to_move xmlns:="some" date="somedate">  # <---- Here is the problem. I don't want the namespace.
    <child>some text</child>
    ...
...
</to_move>
...
</some>

所以问题是:如何告诉etree.tostring()不写xmlns:="some".这可能吗?我为lxml.etree的api文档而苦苦挣扎,但找不到满意的答案.

So the question is: How to tell etree.tostring() not to write the xmlns:="some". Is this possible? I struggeled with the api-documentation of lxml.etree, but I couldn't find a satisfying answer.

这是我在etree.trostring中找到的内容:

This is what I found for etree.trostring:

tostring(element_or_tree, encoding=None, method="xml",
xml_declaration=None, pretty_print=False, with_tail=True,
standalone=None, doctype=None, exclusive=False, with_comments=True)

将元素序列化为其XML的编码字符串表示形式 树.

Serialize an element to an encoded string representation of its XML tree.

对我来说,tostring()的每个参数似乎都无济于事.有任何建议或更正吗?

To me every one of the parameters of tostring() does not seem to help. Any suggestion or corrections?

推荐答案

我经常这样抓取一个命名空间来为其创建别名:

I often grab a namespace to make an alias for it like this:

someXML = lxml.etree.XML(someString)
if ns is None:
      ns = {"m": someXML.tag.split("}")[0][1:]}
someid = someXML.xpath('.//m:ImportantThing//m:ID', namespaces=ns)

您可以执行类似的操作以获取名称空间,以便制作一个使用tostring后将其清除的正则表达式.

You could do something similar to grab the namespace in order to make a regex that will clean it up after using tostring.

或者您可以清理输入字符串.查找第一个空格,检查是否紧跟着xmlns,如果是,则删除整个xmlns直到下一个空格,如果没有,则删除该空格.重复直到不再有空格或xmlns声明.但是不要越过第一个>.

Or you could clean up the input string. Find the first space, check if it is followed by xmlns, if yes, delete the whole xmlns bit up to the next space, if no delete the space. Repeat until there are no more spaces or xmlns declarations. But don't go past the first >.

这篇关于如何告诉lxml.etree.tostring(element)不要在python中编写名称空间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆