为什么在python中按字符串声明unicode? [英] Why declare unicode by string in python?

查看:24
本文介绍了为什么在python中按字符串声明unicode?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我还在学习python,我有一个疑问:

I'm still learning python and I have a doubt:

在 python 2.6.x 中,我通常像这样在文件头中声明编码(如 PEP 0263)

In python 2.6.x I usually declare encoding in the file header like this (as in PEP 0263)

# -*- coding: utf-8 -*-

之后,我的字符串照常写入:

After that, my strings are written as usual:

a = "A normal string without declared Unicode"

但是每次看到python项目代码时,都没有在header中声明编码.相反,它在每个字符串上声明如下:

But everytime I see a python project code, the encoding is not declared at the header. Instead, it is declared at every string like this:

a = u"A string with declared Unicode"

有什么区别?这样做的目的是什么?我知道 Python 2.6.x 默认设置 ASCII 编码,但它可以被标头声明覆盖,那么每个字符串声明有什么意义?

What's the difference? What's the purpose of this? I know Python 2.6.x sets ASCII encoding by default, but it can be overriden by the header declaration, so what's the point of per string declaration?

附录: 似乎我将文件编码与字符串编码混为一谈.谢谢你的解释:)

Addendum: Seems that I've mixed up file encoding with string encoding. Thanks for explaining it :)

推荐答案

正如其他人所提到的,这是两个不同的事情.

Those are two different things, as others have mentioned.

当您指定 # -*- coding: utf-8 -*- 时,您是在告诉 Python 您保存的源文件是 utf-8.Python 2 的默认值是 ASCII(对于 Python 3,它是 utf-8).这只会影响解释器如何读取文件中的字符.

When you specify # -*- coding: utf-8 -*-, you're telling Python the source file you've saved is utf-8. The default for Python 2 is ASCII (for Python 3 it's utf-8). This just affects how the interpreter reads the characters in the file.

一般来说,无论编码是什么,将高 unicode 字符嵌入到您的文件中可能不是最好的主意;您可以使用字符串 unicode 转义符,这两种编码都可以使用.

In general, it's probably not the best idea to embed high unicode characters into your file no matter what the encoding is; you can use string unicode escapes, which work in either encoding.

当你声明一个在前面带有u的字符串时,比如u'This is a string',它告诉Python编译器字符串是Unicode,而不是字节.这主要由解释器透明地处理;最明显的区别是您现在可以在字符串中嵌入 unicode 字符(即 u'u2665' 现在是合法的).您可以使用 from __future__ import unicode_literals 将其设为默认值.

When you declare a string with a u in front, like u'This is a string', it tells the Python compiler that the string is Unicode, not bytes. This is handled mostly transparently by the interpreter; the most obvious difference is that you can now embed unicode characters in the string (that is, u'u2665' is now legal). You can use from __future__ import unicode_literals to make it the default.

这仅适用于 Python 2;在 Python 3 中默认是 Unicode,你需要在前面指定一个 b(比如 b'这些是字节',来声明一个字节序列).

This only applies to Python 2; in Python 3 the default is Unicode, and you need to specify a b in front (like b'These are bytes', to declare a sequence of bytes).

这篇关于为什么在python中按字符串声明unicode?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆