lxml/python使用CDATA部分读取xml [英] lxml/python reading xml with CDATA section

查看：99 发布时间：2021/4/21 19:48:41 python python-3.x lxml elementtree cdata

本文介绍了lxml/python使用CDATA部分读取xml的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我的xml中，有一个 CDATA 部分.我想保留CDATA部分，然后剥离它.有人可以提供以下帮助吗?

In my xml I have a CDATA section. I want to keep the CDATA part, and then strip it. Can someone help with the following?

默认设置无效:

$ from io import StringIO
$ from lxml import etree
$ xml = '<Subject> My Subject: 美海軍研究船勘查台海水文？ 船<![CDATA[&#xE9;]]>€ </Subject>'
$ tree = etree.parse(StringIO(xml))
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文？ 船&#xE9;€ '

这篇文章似乎建议使用 parser 选项 strip_cdata = False 可以保留cdata，但无效:

This post seems to suggest that a parser option strip_cdata=False may keep the cdata, but it has no effect:

$ parser=etree.XMLParser(strip_cdata=False)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text    
' My Subject: 美海軍研究船勘查台海水文？ 船&#xE9;€ '

使用默认值为 strip_cdata = True 的结果相同:

Using strip_cdata=True, which should be the default, yields the same:

$ parser=etree.XMLParser(strip_cdata=True)
$ tree = etree.parse(StringIO(xml), parser=parser)    
$ tree.getroot().text    
' My Subject: 美海軍研究船勘查台海水文？ 船&#xE9;€ '

推荐答案

CDATA节不会保留在元素的 text 属性中，即使 strip_cdata = False 是如您所注意到的，在解析XML内容时使用.请参见 https://lxml.de/api.html#cdata .

CDATA sections are not preserved in the text property of an element, even if strip_cdata=False is used when the XML content is parsed, as you have noticed. See https://lxml.de/api.html#cdata.

CDATA部分 :

使用 tostring()进行序列化时:

print(etree.tostring(tree.getroot(), encoding="UTF-8").decode())

写入文件时:

When writing to a file:

tree.write("subject.xml", encoding="UTF-8")

这篇关于lxml/python使用CDATA部分读取xml的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

lxml/python使用CDATA部分读取xml [英] lxml/python reading xml with CDATA section

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

lxml/python使用CDATA部分读取xml [英] lxml/python reading xml with CDATA section

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭