Python lxml:忽略XML声明(错误) [英] Python lxml: Ignore XML declaration (errors)

查看：321 发布时间：2020/5/4 8:37:01 python xml lxml thunar

本文介绍了Python lxml:忽略XML声明(错误)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用lxml Python模块解析文件浏览器Thunar的自定义操作文件(~/.config/Thunar/uca.xml).

I am trying to parse the file browser Thunar's custom actions files (~/.config/Thunar/uca.xml) with the lxml Python module.

出于某种原因，Thunar显然在这些文件中写入了malformed declaration:

For some reason, Thunar obviously writes a malformed declaration into these files:

<?xml encoding="UTF-8" version="1.0"?>

很明显，预计version将作为声明中的第一个属性"出现.如果我尝试解析文件，则lxml会引发XMLSyntaxError.

Obviously, the version is expected to appear as the first "attribute" in the declaration. lxml raises an XMLSyntaxError if I try to parse the file.

不，我不能简单地更正该声明，因为Thunar一直用虚假的声明覆盖它.

And no, I cannot simply correct the declaration, becaue Thunar keeps overwriting it with the bogus one.

这很可能是Thunar中的错误.

This might very likely be a bug in Thunar.

尽管如此，我想知道如何使用lxml忽略XML声明.

Nevertheless, I would like to know how to ignore the XML declaration with lxml.

我知道我可以预处理XML文档以过滤掉XML声明.但这似乎不是很优雅.由于XML似乎默认使用1.0版和UTF-8编码，因此肯定有可能忽略声明并假定lxml中的声明.我在文档中或Google上都找不到任何东西，我可能忽略了一些东西.

I know that I could pre-process the XML document to filter out the XML declaration. But this doesn't seem very elegant. Since XML seems to default to version 1.0 and UTF-8 encoding, there surely is a possibility to just ignore the declaration and assume that in lxml. I didn't find anything in the documentation or on google, I might have overlooked something.

推荐答案

我对Thunar知之甚少，但是如果它在问题中产生XML声明，那就是一个错误.错误的XML声明会使文档格式错误.

I know very little about Thunar, but if it produces the XML declaration in the question, then that is a bug. Having an incorrect XML declaration makes the document ill-formed.

XML语法为XML声明中的项目指定了一个正确的顺序. version必须排在第一位，encoding其次.请参见 http://w3.org/TR/xml/#NT-XMLDecl .

The XML grammar specifies one correct order for the items in the XML declaration. version must come first and encoding second. See http://w3.org/TR/xml/#NT-XMLDecl.

但是，通过lxml，您可以使用将recover选项设置为True的解析器实例进行解析.在这种情况下，它可以工作.错误的XML声明将被忽略.

However, with lxml you can parse using a parser instance that has the recover option set to True. It works in this case. The bad XML declaration is ignored.

from lxml import etree 

parser = etree.XMLParser(recover=True)
tree = etree.parse('uca.xml', parser)

请参见 http://lxml.de/api/lxml.etree. XMLParser-class.html

这篇关于Python lxml:忽略XML声明(错误)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python lxml:忽略XML声明(错误) [英] Python lxml: Ignore XML declaration (errors)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python lxml:忽略XML声明(错误) [英] Python lxml: Ignore XML declaration (errors)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭