如何防止str将unicode字符编码为十六进制代码? [英] How to prevent str to encode unicode characters as hex codes?
问题描述
当我直接在Python中打开一个unicode字符串时,我看到一个字符串与我的字符串中具有相同的字符。当我把它嵌入到一个容器(放在一个列表,一个地图等)中时, str
表示将unicode字符转换为\uXXXX表示。
有趣的是,我可以使用一个字符串调用这个容器上的一个 print
,但是不能 print str 字符串本身(给出一个UnicodeEncodeError)。
When I print
a unicode string in Python directly, I see a string with same characters that I have in my string. When I embed it into some container (put in a list, in a map, etc), str
representation converts unicode characters to \uXXXX representation.
Interestingly, I can call a print
on this container with a string, but cannot print str
of a string itself (gives a UnicodeEncodeError).
我可以配置 str
将嵌套字符串编码为UTF8字符串吗?查看这个十六进制符号使调试变得非常痛苦。
Can I configure str
to encode nested strings to UTF8 strings? Looking at this hex symbols makes debugging very painful.
示例:
>>> v = u"abc123абв"
>>> d = [v]
>>> print v
abc123абв
>>> print d
[u'abc123\u0430\u0431\u0432']
>>> print str(v)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec cant encode characters in position 6-8: ordinal not in range(128)
>>> print str(d)
[u'abc123\u0430\u0431\u0432']
我在ubuntu上使用Python 2.7.6,控制台编码是UTF8。 Python似乎也使用UTF8:
I'm using Python 2.7.6 on ubuntu and console encoding is UTF8. Python seems to use UTF8 as well:
>>> print(sys.stdout.encoding)
UTF-8
>>> print(locale.getpreferredencoding())
UTF-8
>>> print(sys.getfilesystemencoding())
UTF-8
推荐答案
print [v]
call repr(v)
返回ascii可打印字符,其他所有内容使用 \x
, \u
, \U
,...
print [v]
calls repr(v)
that returns ascii-printable characters as is and everything else is escaped using \x
, \u
, \U
, ...
记住一个对象,如 dict(a = 1)
不同于文本表示( repr(dict(a = 1))
)。 Unicode字符串也是一个对象( type(v)== unicode
)像任何其他一样,因此 repr(v)不是v
(btw, repr(repr(v))不是repr(v)
也可以考虑)
Remember an object such as dict(a=1)
is different from its text representation (repr(dict(a=1))
). Unicode string is an object too (type(v) == unicode
) like any other and therefore repr(v) is not v
(btw, repr(repr(v)) is not repr(v)
too -- think about it).
要在Python控制台中显示用于调试的人性化的文本,您可以提供自定义的 sys.displayhook
,例如,您可以编码任何(嵌入) unicode
对象使用 sys.stdout.encoding
。在Python 3中, repr(unicode_string)
返回在当前环境中可打印的Unicode字符(将导致 UnicodeEncodeError
被转义)
To display human-readable text for debugging in Python console, you could provide custom sys.displayhook
e.g., you could encode any (embedded) unicode
object using sys.stdout.encoding
. In Python 3, repr(unicode_string)
returns Unicode characters that are printable in the current environment as is (characters that would cause UnicodeEncodeError
are escaped).
str(v)
raise UnicodeEncodeError
是无关的。 str(v)
调用 v.encode(sys.getdefaultencoding())
因此,任何unicode字符串都失败非ASCII字符。不要在Unicode字符串上调用 str()
(这几乎总是一个错误),直接打印Unicode。
str(v)
raising UnicodeEncodeError
is unrelated. str(v)
calls v.encode(sys.getdefaultencoding())
and therefore it fails for any unicode string with non-ascii characters. Do not call str()
on Unicode strings (it is almost always an error), print Unicode directly instead.
这篇关于如何防止str将unicode字符编码为十六进制代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!