为什么xml包会在Python3中修改我的xml文件? [英] Why does xml package modify my xml file in Python3?
问题描述
我将Python3.5中的 xml
库用于读取和编写 xml文件。我没有修改文件。只需打开并写。但是库会修改文件。
I use the xml
library in Python3.5 for reading and writing an xml-file. I don't modify the file. Just open and write. But the library modifes the file.
- 为什么要对其进行修改?
- 如何防止这种情况发生? ?例如我只想在不丢失任何其他信息的情况下,在一个非常复杂的xml文件中替换特定标记或它的值。
这是示例文件
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<movie>
<title>Der Eisbär</title>
<ids>
<entry>
<key>tmdb</key>
<value xsi:type="xs:int" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">9321</value>
</entry>
<entry>
<key>imdb</key>
<value xsi:type="xs:string" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">tt0167132</value>
</entry>
</ids>
</movie>
这是代码
import xml.etree.ElementTree as ET
tree = ET.parse('x.nfo')
tree.write('y.nfo', encoding='utf-8')
而xml文件变成了这个
And the xml-file becomes this
<movie xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<title>Der Eisbär</title>
<ids>
<entry>
<key>tmdb</key>
<value xsi:type="xs:int">9321</value>
</entry>
<entry>
<key>imdb</key>
<value xsi:type="xs:string">tt0167132</value>
</entry>
</ids>
</movie>
- 第1行消失了。
- 第2行中的
< movie>
标签现在具有属性。 -
< value> ;第7行和第11行中的
-tag现在具有更少的属性。 - Line 1 is gone.
- The
<movie>
-tag in line 2 has attributes now. - The
<value>
-tag in line 7 and 11 now has less attributes. - How do I parse and write XML using Python's ElementTree without moving namespaces around?
- Keep Existing Namespaces when overwriting XML file with ElementTree and Python
推荐答案
请注意, xml程序包和 xml
库是不明确的。标准库中有几个与XML相关的模块: https://docs.python.org /3/library/xml.html 。
Note that "xml package" and "the xml
library" are ambiguous. There are several XML-related modules in the standard library: https://docs.python.org/3/library/xml.html.
为什么要对其进行修改?
Why is it modified?
ElementTree将名称空间声明移至根元素,并删除文档中未实际使用的名称空间。
ElementTree moves namespace declarations to the root element, and namespaces that aren't actually used in the document are removed.
ElementTree为什么要这样做?我不知道,但是也许这是一种简化实施的方法。
Why does ElementTree do this? I don't know, but perhaps it is a way to make the implementation simpler.
如何防止这种情况?例如我只想在不丢失任何其他信息的情况下,在一个非常复杂的xml文件中替换特定标签或它的值。
How can I prevent this? e.g. I just want to replace specific tag or it's value in a quite complex xml-file without loosing any other informations.
我不认为有办法防止这种情况。这个问题已经提过。这是两个非常相似的问题,没有答案:
I don't think there is a way to prevent this. The issue has been brought up before. Here are two very similar questions with no answers:
我的建议是使用 lxml 而不是ElementTree。使用lxml,名称空间声明将保留在原始文件中的位置。
My suggestion is to use lxml instead of ElementTree. With lxml, the namespace declarations will remain where they occur in the original file.
第1行消失了。
Line 1 is gone.
该行是XML声明。建议但不强制要拥有一个。
That line is the XML declaration. It is recommended but not mandatory to have one.
如果您始终需要XML声明,请在 write(中使用
方法调用。 xml_declaration = True
)
If you always want an XML declaration, use xml_declaration=True
in the write()
method call.
这篇关于为什么xml包会在Python3中修改我的xml文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!