使用Python转义XML中未转义的字符 [英] Escape unescaped characters in XML with Python

查看:271
本文介绍了使用Python转义XML中未转义的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在无效的XML文件中转义特殊字符,该文件长约5000行.这是我必须处理的XML的示例:

I need to escape special characters in an invalid XML file which is about 5000 lines long. Here's an example of the XML that I have to deal with:

<root>
 <element>
  <name>name & surname</name>
  <mail>name@name.org</mail>
 </element>
</root>

这里的问题是字符&"在名字里.您如何使用Python库转义这样的特殊字符?我找不到使用 BeautifulSoup 的方法.

Here the problem is the character "&" in the name. How would you escape special characters like this with a Python library? I didn't find a way to do it with BeautifulSoup.

推荐答案

如果您不关心xml中的无效字符,则可以使用XML解析器的recover选项(请参阅

If you don't care about invalid characters in the xml you could use XML parser's recover option (see Parsing broken XML with lxml.etree.iterparse):

from lxml import etree

parser = etree.XMLParser(recover=True) # recover from bad characters.
root = etree.fromstring(broken_xml, parser=parser)
print etree.tostring(root)

输出

<root>
<element>
<name>name  surname</name>
<mail>name@name.org</mail>
</element>
</root>

这篇关于使用Python转义XML中未转义的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆