在python源文件中包含unicode的首选方法是什么? [英] What's the preferred way to include unicode in python source files?

查看:674
本文介绍了在python源文件中包含unicode的首选方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在源代码中使用unicode字符串时,似乎有很多方法来蒙皮猫。 文档和相关PEP有大量关于可能的信息,但很少关于什么是首选。

When using unicode strings in source code, there seems to be many ways to skin a cat. The docs and the relevant PEPs have plenty of information about what's possible, but are scant about what is preferred.

例如,以下每个似乎给出相同的结果:

For example, the following each seem to give same result:

# coding: utf8
u1 = '\xe2\x82\xac'.decode('utf8')
u2 = u'\u20ac'
u3 = unichr(0x20ac)
u4 = "€".decode('utf8')
u5 = u"€"


b $ b

如果使用 __ future __ 导入,我找到了一个选项:

If using the __future__ imports, I've found one more option:

# coding: utf8
from __future__ import unicode_literals
u6 = "€"

在python我习惯了有一个明显的方法来做,所以在源文件中包含国际内容的推荐方法是什么?

In python I am used to there being one obvious way to do it, so what is the recommended method of including international content in source files?

这是一个python 2问题。

This is a python 2 question.

某些背景

方法u1,u2,u3 ,但我已经看到足够的人写这样,我认为这不只是个人偏好 - 有什么特别的原因,我们可能只想强制ascii字符在源文件,而不是指定编码,或者这只是一个习惯更可能发现在老代码躺在?

Methods u1, u2, u3 just seem silly to me, but I have seen enough people writing like this that I assume it is not just personal preference - is there any particular reason why we might want to force only ascii characters in source files, rather than specifying the encoding, or is this just a habit more likely to be found in older code lying around?

在代码中使用实际符号而不是一些转义序列有很大的可读性,并且不这样做似乎忽略了语言的优势,而不是利用python开发人员的辛勤工作。

There's huge readability improvement in the code to use the actual symbols rather than some escape sequences, and to not do so would seem to be ignoring the strengths of the language rather than taking advantage of hard work by the python devs.

推荐答案

我认为我使用的最常见的方式(在Python 2中):

I think the most common way I've used (in Python 2) is:

# coding: utf-8

text = u'résumé'




  • 文字可读。比较 text = u'r\\\ésum\\\é',其中我必须查找是什么字符。

  • 如果您使用的是Unicode,则您的变数最肯定是文字和不是二进位资料,所以没有意义。除了 unicode 对象之外的任何对象。 ('€'成为选项。)

    • The text is readable. Compare to text = u'r\u00e9sum\u00e9', where I must look up what character that is. Everything else is less readable.
    • If you're using Unicode, your variable is most certainly text and not binary data, so there's no point in keeping it in anything other than a unicode object. (Just in case '€' became an option.)
    • from __future__ import unicode_literals 更改程序的解析模式;我想你需要更加意识到文本和文本之间的区别。二进制数据。 (如果你问我,大多数程序员都不擅长。)

      from __future__ import unicode_literals changes the parsing mode of the program; I think you'd need to be more aware of the difference between text & binary data. (Something that, if you ask me, most programmers are not good at.)

      在大型项目中,解析模式只能改变一个文件,所以它可能更好作为一个所有的文件或没有文件,所以你不需要参考文件头。如果你在Python 2,默认可能关闭,除非你也瞄准Python 3.如果你的目标是Python 2.5或更旧的¹,那么它不是一个选项。

      In large projects, it might be confusing to have the parsing mode change for just one file, so it's probably better as an all files or no files, so you don't need to refer to the file header. If you're in Python 2, the default is probably off unless you're also targetting Python 3. If you're targetting Python 2.5 or older¹, then it's not an option.

      当前的大多数编辑器都支持Unicode。也就是说,我看到编辑器损坏了文件中的非ASCII字符,但极少出现;如果这样的提交的作者没有充分地审查他的代码,代码审查应该抓住这一点。 (差异将是显而易见的。)不值得支持这些人:Unicode是在这里留下;跟踪他们并修复他们的设置。注意, vim 会处理Unicode很好。

      Most editors these days are Unicode-aware. That said, I have seen editors corrupt non-ASCII characters in files, but exceedingly rarely; if the author of such a commit doesn't review his code adequately, code review should catch this. (The diff will be painfully obvious.) It is not worth supporting these people: Unicode is here to stay; track them down and fix their set up. Of note, vim handles Unicode just fine.

      ¹您应该升级。

      这篇关于在python源文件中包含unicode的首选方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆