检查是否存在XML声明 [英] Checking if XML declaration is present

查看:99
本文介绍了检查是否存在XML声明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试检查xml文件是否包含必要的xml声明(标头),让我们说:

  <?xml version = 1.0 encoding = UTF-8?> 
...其余xml文件...

我正在使用xml ElementTree进行读取并从文件中获取信息,但是即使没有标题也似乎可以很好地加载文件。



到目前为止,我尝试过的是:

 将xml.etree.ElementTree导入为ET 
树= ET.parse(someXmlFile)

尝试:
xmlFile = ET.tostring(tree.getroot(),encoding ='utf8')。decode('utf8')
除外:
sys.stderr.write( Wrong xml2标头(n))
出口(31)

,如果re.match(r ^ \s *< \?xml version = \'1\。 0\'encoding = \'utf8\'\?> \s +,xmlFile)为None:
sys.stderr.write(错误的xml1标头\n)
exit(31)

但是ET.tostring()函数只是在以下情况下组成标题



有没有办法用ET检查xml标头?还是以某种方式在使用ET.parse加载文件时抛出错误,如果文件不包含xml标头?

解决方案

tl; dr

  from xml.dom.minidom import parseString 
def has_xml_declaration(xml) :
返回parseString(xml).version

来自维基百科的XML声明


如果一个XML文档缺乏编码规范,一个XML解析器假定
的编码是UTF-8或UTF-16,除非该编码已经由更高的协议确定了


...


该声明可以选择省略,因为它声明了作为其
编码的默认编码。但是,如果文档改为
使用XML 1.1或其他字符编码,则需要
声明。版本7之前的Internet Explorer进入怪癖模式,如果
遇到文本/ html


因此,即使XML文档中省略了XML声明,该代码段也是如此:

  if re.match(r ^< \?xml\s * version = \'1\.0\'encoding = \'utf8\'\s * \?>,xmlFile.decode('utf -8'))为无:

将在此XML文档中找到 the默认XML声明。请注意,我使用的是xmlFile.decode('utf-8')而不是xmlFile。
如果您不担心使用 minidom ,则可以使用以下代码段:



<$来自xml.dom.minidom的p $ p> import parse

dom = parse('bookstore-003.xml')
print('<?xml version = {} encoding = {}?>'。format(dom.version,dom.encoding))

这是一个正常工作的小提琴
Int bookstore-001.xml中存在XML声明,在bookstore-002.xml中不存在XML声明,在bookstore-003.xml中存在与第一个示例不同的XML声明。 print 指令相应地打印版本和编码。

 < ?xml版本= 1.0编码= UTF-8?> 

<?xml version = None encoding = None?>

<?xml version = 1.0 encoding = ISO-8859-1?>


I am trying to check whether an xml file contains the necessary xml declaration ("header"), let's say:

<?xml version="1.0" encoding="UTF-8"?>
...rest of xml file...

I am using xml ElementTree for reading and getting info out of the file, but it seems to load a file just fine even if it does not have the header.

What I tried so far is this:

import xml.etree.ElementTree as ET
tree = ET.parse(someXmlFile)    

try:
    xmlFile = ET.tostring(tree.getroot(), encoding='utf8').decode('utf8')
except:
    sys.stderr.write("Wrong xml2 header\n")
    exit(31)

if re.match(r"^\s*<\?xml version=\'1\.0\' encoding=\'utf8\'\?>\s+", xmlFile) is None:
    sys.stderr.write("Wrong xml1 header\n")
    exit(31)

But the ET.tostring() function just "makes up" a header if it is not present in the file.

Is there any way to check for a xml header with ET? Or somehow throw an error while loading the file with ET.parse, if a file does not contain the xml header?

解决方案

tl;dr

from xml.dom.minidom import parseString
def has_xml_declaration(xml):
    return parseString(xml).version

From Wikipedia's XML declaration

If an XML document lacks encoding specification, an XML parser assumes that the encoding is UTF-8 or UTF-16, unless the encoding has already been determined by a higher protocol.

...

The declaration may be optionally omitted because it declares as its encoding the default encoding. However, if the document instead makes use of XML 1.1 or another character encoding, a declaration is necessary. Internet Explorer prior to version 7 enters quirks mode, if it encounters an XML declaration in a document served as text/html

So even if the XML declaration is omitted in an XML document, the code-snippet:

if re.match(r"^<\?xml\s*version=\'1\.0\' encoding=\'utf8\'\s*\?>", xmlFile.decode('utf-8')) is None:

will find "the" default XML declaration in this XML document. Please note, that I have used xmlFile.decode('utf-8') instead of xmlFile. If you don't worry to use minidom, you can use the following code-snippet:

from xml.dom.minidom import parse

dom = parse('bookstore-003.xml')
print('<?xml version="{}" encoding="{}"?>'.format(dom.version, dom.encoding))

Here is a working fiddle Int bookstore-001.xml an XML declaration ist present, in bookstore-002.xml no XML declaration ist present and in bookstore-003.xml a different XML declaration than in the first example ist present. The print instruction prints accordingly the version and the encoding:

<?xml version="1.0" encoding="UTF-8"?>

<?xml version="None" encoding="None"?>

<?xml version="1.0" encoding="ISO-8859-1"?>

这篇关于检查是否存在XML声明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆