为什么在python中按字符串声明unicode? [英] Why declare unicode by string in python?
问题描述
我还在学习python,我有一个疑问:
I'm still learning python and I have a doubt:
在 python 2.6.x 中,我通常像这样在文件头中声明编码(如 PEP 0263)
In python 2.6.x I usually declare encoding in the file header like this (as in PEP 0263)
# -*- coding: utf-8 -*-
之后,我的字符串照常写入:
After that, my strings are written as usual:
a = "A normal string without declared Unicode"
但是每次看到python项目代码时,都没有在header中声明编码.相反,它在每个字符串上声明如下:
But everytime I see a python project code, the encoding is not declared at the header. Instead, it is declared at every string like this:
a = u"A string with declared Unicode"
有什么区别?这样做的目的是什么?我知道 Python 2.6.x 默认设置 ASCII 编码,但它可以被标头声明覆盖,那么每个字符串声明有什么意义?
What's the difference? What's the purpose of this? I know Python 2.6.x sets ASCII encoding by default, but it can be overriden by the header declaration, so what's the point of per string declaration?
附录: 似乎我将文件编码与字符串编码混为一谈.谢谢你的解释:)
Addendum: Seems that I've mixed up file encoding with string encoding. Thanks for explaining it :)
推荐答案
正如其他人所提到的,这是两个不同的事情.
Those are two different things, as others have mentioned.
当您指定 # -*- coding: utf-8 -*-
时,您是在告诉 Python 您保存的源文件是 utf-8
.Python 2 的默认值是 ASCII(对于 Python 3,它是 utf-8
).这只会影响解释器如何读取文件中的字符.
When you specify # -*- coding: utf-8 -*-
, you're telling Python the source file you've saved is utf-8
. The default for Python 2 is ASCII (for Python 3 it's utf-8
). This just affects how the interpreter reads the characters in the file.
一般来说,无论编码是什么,将高 unicode 字符嵌入到您的文件中可能不是最好的主意;您可以使用字符串 unicode 转义符,这两种编码都可以使用.
In general, it's probably not the best idea to embed high unicode characters into your file no matter what the encoding is; you can use string unicode escapes, which work in either encoding.
当你声明一个在前面带有u
的字符串时,比如u'This is a string'
,它告诉Python编译器字符串是Unicode,而不是字节.这主要由解释器透明地处理;最明显的区别是您现在可以在字符串中嵌入 unicode 字符(即 u'u2665'
现在是合法的).您可以使用 from __future__ import unicode_literals
将其设为默认值.
When you declare a string with a u
in front, like u'This is a string'
, it tells the Python compiler that the string is Unicode, not bytes. This is handled mostly transparently by the interpreter; the most obvious difference is that you can now embed unicode characters in the string (that is, u'u2665'
is now legal). You can use from __future__ import unicode_literals
to make it the default.
这仅适用于 Python 2;在 Python 3 中默认是 Unicode,你需要在前面指定一个 b
(比如 b'这些是字节'
,来声明一个字节序列).
This only applies to Python 2; in Python 3 the default is Unicode, and you need to specify a b
in front (like b'These are bytes'
, to declare a sequence of bytes).
这篇关于为什么在python中按字符串声明unicode?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!