阅读XML标题编码 [英] Reading XML header encoding

查看：140 发布时间：2017/8/16 23:52:41 python xml encoding

本文介绍了阅读XML标题编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些我想用脚本处理的XML文件，将它们从他们所在的编码转换成UTF-8。

使用这个好的答案中给出的代码我可以进行转换，但是如何读取XML标头中给出的编码？ / p>

例如，我有很多已经在UTF-8中的文件，应该是单独的：

 <？xml version =1.0encoding =utf-8？>

但是，我有很多文件需要转换：

 <？xml version =1.0encoding =windows-1255？>

如何在Python中检测这些文件标题中指定的XML编码？更好的是，在我检测并重新编码文件之后，如何才能将此XML标题更改为utf-8，以避免将来进行处理？

解决方案

使用 lxml 来执行解析;然后，您可以使用以下方式访问原始编码：

  from lxml import etree 
 
 with open（filename， 'r'）作为xmlfile：
 tree = etree.parse（xmlfile）
如果tree.docinfo.encoding =='utf-8'：
＃已经在正确的编码，中止
 return

然后您可以使用 lxml 在UTF-8中再次写入文件。

I have a number of XML files I'd like to process with a script, converting them from whatever encoding that they're in to UTF-8.

Using the code given in this great answer I can do the conversion, but how can I read the encoding given in the XML header?

For example, I have many files which are already in UTF-8, which should be left alone:

<?xml version="1.0" encoding="utf-8"?>

However, I have a lot of files which do need to be converted:

<?xml version="1.0" encoding="windows-1255"?>

How can I detect the XML encoding specified in the headers of these files in Python? Better, after I detect and reencode the files, how then can I change this XML header to read "utf-8" to avoid processing it in the future?

解决方案

Use lxml to do the parsing; you can then access the original encoding with:

from lxml import etree

with open(filename, 'r') as xmlfile:
    tree = etree.parse(xmlfile)
    if tree.docinfo.encoding == 'utf-8':
        # already in correct encoding, abort
        return

You can then use lxml to write the file out again in UTF-8.

这篇关于阅读XML标题编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

阅读XML标题编码 [英] Reading XML header encoding

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

阅读XML标题编码 [英] Reading XML header encoding

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭