在 Python 2.6 中使用 unicode_literals 有什么问题吗? [英] Any gotchas using unicode_literals in Python 2.6?

查看:13
本文介绍了在 Python 2.6 中使用 unicode_literals 有什么问题吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们已经让我们的代码库在 Python 2.6 下运行.为了准备 Python 3.0,我们开始添加:

<前>从 __future__ 导入 unicode_literals

进入我们的 .py 文件(当我们修改它们时).我想知道是否有其他人一直在这样做并且遇到了任何不明显的问题(可能是在花了很多时间调试之后).

解决方案

我在处理 unicode 字符串时遇到的主要问题是当您将 utf-8 编码字符串与 unicode 字符串混合使用时.

例如,考虑以下脚本.

两个.py

# 编码:utf-8name = 'hello world from two'

一个.py

# 编码:utf-8从 __future__ 导入 unicode_literals进口两个name = 'hello world from one'打印名称 + two.name

运行python one.py的输出为:

回溯(最近一次调用最后一次):文件one.py",第 5 行,在 <module> 中打印名称 + two.nameUnicodeDecodeError:ascii"编解码器无法解码位置 4 中的字节 0xc3:序号不在范围内(128)

在这个例子中,two.name 是一个 utf-8 编码的字符串(不是 unicode),因为它没有导入 unicode_literalsone.name 是一个 unicode 字符串.当您混合使用两者时,python 会尝试解码编码字符串(假设它是 ascii)并将其转换为 unicode 并失败.如果你做了print name + two.name.decode('utf-8'),它会起作用.

如果您对字符串进行编码并稍后尝试混合它们,也会发生同样的事情.例如,这有效:

# 编码:utf-8html = '<html><body>你好世界</body></html>'如果是实例(html,unicode):html = html.encode('utf-8')打印调试:%s"% html

输出:

调试:<html><body>你好世界</body></html>

但是在添加 import unicode_literals 之后,它不会:

# 编码:utf-8从 __future__ 导入 unicode_literalshtml = '<html><body>你好世界</body></html>'如果是实例(html,unicode):html = html.encode('utf-8')打印调试:%s"% html

输出:

回溯(最近一次调用最后一次):文件test.py",第 6 行,在 <module> 中打印调试:%s"% htmlUnicodeDecodeError:ascii"编解码器无法解码位置 16 中的字节 0xc3:序号不在范围内(128)

它失败是因为 'DEBUG: %s' 是一个 unicode 字符串,因此 python 尝试解码 html.修复打印的几种方法是执行 print str('DEBUG: %s') % htmlprint 'DEBUG: %s' % html.decode('utf-8').

我希望这能帮助您了解使用 unicode 字符串时的潜在问题.

We've already gotten our code base running under Python 2.6. In order to prepare for Python 3.0, we've started adding:

from __future__ import unicode_literals

into our .py files (as we modify them). I'm wondering if anyone else has been doing this and has run into any non-obvious gotchas (perhaps after spending a lot of time debugging).

解决方案

The main source of problems I've had working with unicode strings is when you mix utf-8 encoded strings with unicode ones.

For example, consider the following scripts.

two.py

# encoding: utf-8
name = 'helló wörld from two'

one.py

# encoding: utf-8
from __future__ import unicode_literals
import two
name = 'helló wörld from one'
print name + two.name

The output of running python one.py is:

Traceback (most recent call last):
  File "one.py", line 5, in <module>
    print name + two.name
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

In this example, two.name is an utf-8 encoded string (not unicode) since it did not import unicode_literals, and one.name is an unicode string. When you mix both, python tries to decode the encoded string (assuming it's ascii) and convert it to unicode and fails. It would work if you did print name + two.name.decode('utf-8').

The same thing can happen if you encode a string and try to mix them later. For example, this works:

# encoding: utf-8
html = '<html><body>helló wörld</body></html>'
if isinstance(html, unicode):
    html = html.encode('utf-8')
print 'DEBUG: %s' % html

Output:

DEBUG: <html><body>helló wörld</body></html>

But after adding the import unicode_literals it does NOT:

# encoding: utf-8
from __future__ import unicode_literals
html = '<html><body>helló wörld</body></html>'
if isinstance(html, unicode):
    html = html.encode('utf-8')
print 'DEBUG: %s' % html

Output:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    print 'DEBUG: %s' % html
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

It fails because 'DEBUG: %s' is an unicode string and therefore python tries to decode html. A couple of ways to fix the print are either doing print str('DEBUG: %s') % html or print 'DEBUG: %s' % html.decode('utf-8').

I hope this helps you understand the potential gotchas when using unicode strings.

这篇关于在 Python 2.6 中使用 unicode_literals 有什么问题吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆