在Python 2.6中使用unicode_literals有任何陷阱吗? [英] Any gotchas using unicode_literals in Python 2.6?

查看:125
本文介绍了在Python 2.6中使用unicode_literals有任何陷阱吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们已经使我们的代码库在Python 2.6下运行.为了准备Python 3.0,我们开始添加:

We've already gotten our code base running under Python 2.6. In order to prepare for Python 3.0, we've started adding:


from __future__ import unicode_literals

进入我们的.py文件(我们对其进行修改).我想知道是否还有其他人正在这样做并且遇到了任何非显而易见的陷阱(也许是在花费大量时间进行调试之后).

into our .py files (as we modify them). I'm wondering if anyone else has been doing this and has run into any non-obvious gotchas (perhaps after spending a lot of time debugging).

推荐答案

我处理unicode字符串的主要问题来源是将utf-8编码的字符串与unicode字符串混合使用.

The main source of problems I've had working with unicode strings is when you mix utf-8 encoded strings with unicode ones.

例如,考虑以下脚本.

two.py

# encoding: utf-8
name = 'helló wörld from two'

one.py

# encoding: utf-8
from __future__ import unicode_literals
import two
name = 'helló wörld from one'
print name + two.name

运行python one.py的输出是:

Traceback (most recent call last):
  File "one.py", line 5, in <module>
    print name + two.name
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

在此示例中,two.name是utf-8编码的字符串(不是unicode),因为它没有导入unicode_literals,而one.name是unicode字符串.当您将两者混合使用时,python会尝试解码编码的字符串(假设它是ascii)并将其转换为unicode并失败.如果您做了print name + two.name.decode('utf-8'),它会起作用.

In this example, two.name is an utf-8 encoded string (not unicode) since it did not import unicode_literals, and one.name is an unicode string. When you mix both, python tries to decode the encoded string (assuming it's ascii) and convert it to unicode and fails. It would work if you did print name + two.name.decode('utf-8').

如果对字符串进行编码并稍后尝试将其混合,则可能会发生相同的情况. 例如,这有效:

The same thing can happen if you encode a string and try to mix them later. For example, this works:

# encoding: utf-8
html = '<html><body>helló wörld</body></html>'
if isinstance(html, unicode):
    html = html.encode('utf-8')
print 'DEBUG: %s' % html

输出:

DEBUG: <html><body>helló wörld</body></html>

但是添加import unicode_literals后不会:

# encoding: utf-8
from __future__ import unicode_literals
html = '<html><body>helló wörld</body></html>'
if isinstance(html, unicode):
    html = html.encode('utf-8')
print 'DEBUG: %s' % html

输出:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    print 'DEBUG: %s' % html
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

它失败,因为'DEBUG: %s'是一个unicode字符串,因此python尝试解码html. print str('DEBUG: %s') % htmlprint 'DEBUG: %s' % html.decode('utf-8')有几种修复打印的方法.

It fails because 'DEBUG: %s' is an unicode string and therefore python tries to decode html. A couple of ways to fix the print are either doing print str('DEBUG: %s') % html or print 'DEBUG: %s' % html.decode('utf-8').

我希望这可以帮助您了解使用unicode字符串时的潜在陷阱.

I hope this helps you understand the potential gotchas when using unicode strings.

这篇关于在Python 2.6中使用unicode_literals有任何陷阱吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆