如何防止str将unicode字符编码为十六进制代码? [英] How to prevent str to encode unicode characters as hex codes?

查看:115
本文介绍了如何防止str将unicode字符编码为十六进制代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我直接在Python中打开一个unicode字符串时,我看到一个字符串与我的字符串中具有相同的字符。当我把它嵌入到一个容器(放在一个列表,一个地图等)中时, str 表示将unicode字符转换为\uXXXX表示。
有趣的是,我可以使用一个字符串调用这个容器上的一个 print ,但是不能 print str 字符串本身(给出一个UnicodeEncodeError)。

When I print a unicode string in Python directly, I see a string with same characters that I have in my string. When I embed it into some container (put in a list, in a map, etc), str representation converts unicode characters to \uXXXX representation. Interestingly, I can call a print on this container with a string, but cannot print str of a string itself (gives a UnicodeEncodeError).

我可以配置 str 将嵌套字符串编码为UTF8字符串吗?查看这个十六进制符号使调试变得非常痛苦。

Can I configure str to encode nested strings to UTF8 strings? Looking at this hex symbols makes debugging very painful.

示例:

>>> v = u"abc123абв"
>>> d = [v]
>>> print v
abc123абв
>>> print d
[u'abc123\u0430\u0431\u0432']
>>> print str(v)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec cant encode characters in position 6-8: ordinal not in range(128)
>>> print str(d)
[u'abc123\u0430\u0431\u0432']



我在ubuntu上使用Python 2.7.6,控制台编码是UTF8。 Python似乎也使用UTF8:

I'm using Python 2.7.6 on ubuntu and console encoding is UTF8. Python seems to use UTF8 as well:

>>> print(sys.stdout.encoding)
UTF-8
>>> print(locale.getpreferredencoding())
UTF-8
>>> print(sys.getfilesystemencoding())
UTF-8


推荐答案

print [v] call repr(v)返回ascii可打印字符,其他所有内容使用 \x \u \U ,...

print [v] calls repr(v) that returns ascii-printable characters as is and everything else is escaped using \x, \u, \U, ...

记住一个对象,如 dict(a = 1)不同于文本表示( repr(dict(a = 1)))。 Unicode字符串也是一个对象( type(v)== unicode )像任何其他一样,因此 repr(v)不是v (btw, repr(repr(v))不是repr(v)也可以考虑)

Remember an object such as dict(a=1) is different from its text representation (repr(dict(a=1))). Unicode string is an object too (type(v) == unicode) like any other and therefore repr(v) is not v (btw, repr(repr(v)) is not repr(v) too -- think about it).

要在Python控制台中显示用于调试的人性化的文本,您可以提供自定义的 sys.displayhook ,例如,您可以编码任何(嵌入) unicode 对象使用 sys.stdout.encoding 。在Python 3中, repr(unicode_string)返回在当前环境中可打印的Unicode字符(将导致 UnicodeEncodeError 被转义)

To display human-readable text for debugging in Python console, you could provide custom sys.displayhook e.g., you could encode any (embedded) unicode object using sys.stdout.encoding. In Python 3, repr(unicode_string) returns Unicode characters that are printable in the current environment as is (characters that would cause UnicodeEncodeError are escaped).

str(v) raise UnicodeEncodeError 是无关的。 str(v)调用 v.encode(sys.getdefaultencoding())因此,任何unicode字符串都失败非ASCII字符。不要在Unicode字符串上调用 str()(这几乎总是一个错误),直接打印Unicode。

str(v) raising UnicodeEncodeError is unrelated. str(v) calls v.encode(sys.getdefaultencoding()) and therefore it fails for any unicode string with non-ascii characters. Do not call str() on Unicode strings (it is almost always an error), print Unicode directly instead.

这篇关于如何防止str将unicode字符编码为十六进制代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆