在python lxml库中使用西里尔字母的正确方法是什么 [英] What is right way to use cyrillic in python lxml library
问题描述
我尝试在其中生成西里尔文符号的.xml文件.但是结果是出乎意料的.避免此结果的最简单方法是什么? 示例:
I try to generate .xml files fith cyrillic symbols within. But result is unexpected. What is the simplest way to avoid this result? Example:
from lxml import etree
root = etree.Element('пример')
print(etree.tostring(root))
我得到的是:
b'<пример/>'
代替:
b'<пример/>'
推荐答案
etree.tostring()
,不带附加参数,将纯ASCII数据作为bytes
对象输出.您可以使用 etree.tounicode()
:
etree.tostring()
without additional arguments outputs ASCII-only data as a bytes
object. You could use etree.tounicode()
:
>>> from lxml import etree
>>> root = etree.Element('пример')
>>> print(etree.tostring(root))
b'<пример/>'
>>> print(etree.tounicode(root))
<пример/>
或使用 encoding
参数指定编解码器;但是您仍然会得到字节,因此需要再次解码输出:
or specify a codec with the encoding
argument; you'd still get bytes however, so the output would need to be decoded again:
>>> print(etree.tostring(root, encoding='utf8'))
b'<\xd0\xbf\xd1\x80\xd0\xb8\xd0\xbc\xd0\xb5\xd1\x80/>'
>>> print(etree.tostring(root, encoding='utf8').decode('utf8'))
<пример/>
将编码设置为unicode
会提供与tounicode()
相同的输出,并且是首选拼写:
Setting the encoding to unicode
gives you the same output tounicode()
produces, and is the preferred spelling:
>>> print(etree.tostring(root, encoding='unicode'))
<пример/>
这篇关于在python lxml库中使用西里尔字母的正确方法是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!