为什么在python中按字符串声明unicode? [英] Why declare unicode by string in python?

查看：24 发布时间：2021/12/27 15:22:35 python encoding utf-8

本文介绍了为什么在python中按字符串声明unicode?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我还在学习python，我有一个疑问:

I'm still learning python and I have a doubt:

在 python 2.6.x 中，我通常像这样在文件头中声明编码(如 PEP 0263)

In python 2.6.x I usually declare encoding in the file header like this (as in PEP 0263)

# -*- coding: utf-8 -*-

之后，我的字符串照常写入:

After that, my strings are written as usual:

a = "A normal string without declared Unicode"

但是每次看到python项目代码时，都没有在header中声明编码.相反，它在每个字符串上声明如下:

But everytime I see a python project code, the encoding is not declared at the header. Instead, it is declared at every string like this:

a = u"A string with declared Unicode"

有什么区别?这样做的目的是什么?我知道 Python 2.6.x 默认设置 ASCII 编码，但它可以被标头声明覆盖，那么每个字符串声明有什么意义?

What's the difference? What's the purpose of this? I know Python 2.6.x sets ASCII encoding by default, but it can be overriden by the header declaration, so what's the point of per string declaration?

附录: 似乎我将文件编码与字符串编码混为一谈.谢谢你的解释:)

Addendum: Seems that I've mixed up file encoding with string encoding. Thanks for explaining it :)

推荐答案

正如其他人所提到的，这是两个不同的事情.

Those are two different things, as others have mentioned.

当您指定 # -*- coding: utf-8 -*- 时，您是在告诉 Python 您保存的源文件是 utf-8.Python 2 的默认值是 ASCII(对于 Python 3，它是 utf-8).这只会影响解释器如何读取文件中的字符.

When you specify # -*- coding: utf-8 -*-, you're telling Python the source file you've saved is utf-8. The default for Python 2 is ASCII (for Python 3 it's utf-8). This just affects how the interpreter reads the characters in the file.

一般来说，无论编码是什么，将高 unicode 字符嵌入到您的文件中可能不是最好的主意；您可以使用字符串 unicode 转义符，这两种编码都可以使用.

In general, it's probably not the best idea to embed high unicode characters into your file no matter what the encoding is; you can use string unicode escapes, which work in either encoding.

当你声明一个在前面带有u的字符串时，比如u'This is a string'，它告诉Python编译器字符串是Unicode，而不是字节.这主要由解释器透明地处理；最明显的区别是您现在可以在字符串中嵌入 unicode 字符(即 u'u2665' 现在是合法的).您可以使用 from __future__ import unicode_literals 将其设为默认值.

When you declare a string with a u in front, like u'This is a string', it tells the Python compiler that the string is Unicode, not bytes. This is handled mostly transparently by the interpreter; the most obvious difference is that you can now embed unicode characters in the string (that is, u'u2665' is now legal). You can use from __future__ import unicode_literals to make it the default.

这仅适用于 Python 2；在 Python 3 中默认是 Unicode，你需要在前面指定一个 b(比如 b'这些是字节'，来声明一个字节序列).

This only applies to Python 2; in Python 3 the default is Unicode, and you need to specify a b in front (like b'These are bytes', to declare a sequence of bytes).

这篇关于为什么在python中按字符串声明unicode?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么在python中按字符串声明unicode? [英] Why declare unicode by string in python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么在python中按字符串声明unicode? [英] Why declare unicode by string in python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭