在python lxml库中使用西里尔字母的正确方法是什么 [英] What is right way to use cyrillic in python lxml library

查看：78 发布时间：2020/5/4 8:34:16 python xml lxml cyrillic

本文介绍了在python lxml库中使用西里尔字母的正确方法是什么的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试在其中生成西里尔文符号的.xml文件.但是结果是出乎意料的.避免此结果的最简单方法是什么? 示例:

I try to generate .xml files fith cyrillic symbols within. But result is unexpected. What is the simplest way to avoid this result? Example:

from lxml import etree

root = etree.Element('пример')

print(etree.tostring(root))

我得到的是:

b'<&#1087;&#1088;&#1080;&#1084;&#1077;&#1088;/>'

代替:

b'<пример/>'

推荐答案

etree.tostring()，不带附加参数，将纯ASCII数据作为bytes对象输出.您可以使用 etree.tounicode() :

etree.tostring() without additional arguments outputs ASCII-only data as a bytes object. You could use etree.tounicode():

>>> from lxml import etree
>>> root = etree.Element('пример')
>>> print(etree.tostring(root))
b'<&#1087;&#1088;&#1080;&#1084;&#1077;&#1088;/>'
>>> print(etree.tounicode(root))
<пример/>

或使用 encoding参数指定编解码器；但是您仍然会得到字节，因此需要再次解码输出:

or specify a codec with the encoding argument; you'd still get bytes however, so the output would need to be decoded again:

>>> print(etree.tostring(root, encoding='utf8'))
b'<\xd0\xbf\xd1\x80\xd0\xb8\xd0\xbc\xd0\xb5\xd1\x80/>'
>>> print(etree.tostring(root, encoding='utf8').decode('utf8'))
<пример/>

将编码设置为unicode会提供与tounicode()相同的输出，并且是首选拼写:

Setting the encoding to unicode gives you the same output tounicode() produces, and is the preferred spelling:

>>> print(etree.tostring(root, encoding='unicode'))
<пример/>

这篇关于在python lxml库中使用西里尔字母的正确方法是什么的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在python lxml库中使用西里尔字母的正确方法是什么 [英] What is right way to use cyrillic in python lxml library

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在python lxml库中使用西里尔字母的正确方法是什么 [英] What is right way to use cyrillic in python lxml library

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭