如何在 Python 中读取 XML 标头 [英] How to read XML header in Python

查看：38 发布时间：2021/10/1 20:09:55 python xml python-3.x xml-parsing

本文介绍了如何在 Python 中读取 XML 标头的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何在 Python 3 中读取 XML 文档的标题?

How can I read the header of an XML document in Python 3?

理想情况下，我会使用 defusedxml 模块，因为文档指出它更安全，但在这一点上(经过几个小时的尝试解决这个问题)，我会接受任何解析器.

Ideally, I would use the defusedxml module as the documentation states that it's safer, but at this point (after hours of trying to figure this out), I'd settle for any parser.

例如，我有一个看起来像这样的文档(这实际上来自一个练习):

For example, I have a document (this is actually from an exercise) that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"> <!-- this is root -->
    <!-- CONTENTS -->
</plist>

我想知道如何访问根节点之前的所有内容.

I'm wondering how to access everything before the root node.

这似乎是一个很笼统的问题，我以为我很容易在网上找到答案，但我想我错了.我发现的最接近的是关于堆栈溢出的这个问题，这并没有真正的帮助(我查看了 xml.sax，但找不到任何相关信息).

This seems like such a general question that I thought I would easily find an answer online, but I guess I was wrong. The closest thing I found was this question on Stack Overflow, which didn't really help (I looked into xml.sax, but couldn't find anything relevant).

推荐答案

我试过了 minidom 根据您提供的链接.这是我的代码:

I tried minidom which is vulnerable to billion laughs and quadratic blowup attacks according to link you provided. Here is my code:

from xml.dom.minidom import parse

dom = parse('file.xml')
print('<?xml version="{}" encoding="{}"?>'.format(dom.version, dom.encoding))
print(dom.doctype.toxml())
#or
print(dom.getElementsByTagName('plist')[0].previousSibling.toxml())
#or
print(dom.childNodes[0].toxml())

输出:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>

您可以使用 defusedxml 中的 minidom.我下载了那个包，然后用 from defusedxml.minidom import parse 替换了 import，代码使用相同的输出.

You can use minidom from defusedxml. I downloaded that package and just replaced import with from defusedxml.minidom import parse and code worked with same output.

这篇关于如何在 Python 中读取 XML 标头的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 Python 中读取 XML 标头 [英] How to read XML header in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在 Python 中读取 XML 标头 [英] How to read XML header in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭