这些字符串如何在 Python 解释器内部表示?我不明白 [英] How are these strings represented internally in Python interpreter ? I don't understand

查看：59 发布时间：2021/6/26 19:03:11 python string unicode python-2.7

本文介绍了这些字符串如何在 Python 解释器内部表示?我不明白的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

# -*- coding: utf-8 -*-

a = 'éáűőúöüó€'
print type(a)    # <type 'str'>
print a          # éáűőúöüó€
print ord(a[-1]) # 172

为什么会这样?这不应该是 SyntaxError: Non-ASCII character '\xc3' in file ... 吗?字符串中有 unicode 文字.

Why is this working ? Shouldn't be this a SyntaxError: Non-ASCII character '\xc3' in file ... ? There are unicode literals in the string.

当我用u作为前缀时，结果是不同的:

When I prefix it with u, the results are different:

# -*- coding: utf-8 -*-

a = u'éáűőúöüó€'
print type(a)    # <type 'unicode'>
print a          # éáűőúöüó€
print ord(a[-1]) # 8364

为什么?python 中的内部表示有什么区别?我怎么能自己看到呢?:)

Why? What is the difference between the internal representations in python ? How can I see it myself ? :)

推荐答案

字符串中有Unicode文字

There are unicode literals in the string

不，没有.字符串中有字节.Python 只是使用创建文件时编辑器保存到磁盘的字节.

No, there are not. There are bytes in the string. Python simply goes with the bytes your editor saved to disk when you created the file.

当您使用 u'' 为字符串添加前缀时，您向 python 发出信号，表示您正在创建一个 unicode 对象.Python 现在会注意您在源文件顶部指定的编码，并根据您指定的编码将源文件中的字节解码为 unicode 对象.

When you prefixed the string with a u'', you signalled to python that you are creating a unicode object instead. Python now pays attention to the encoding you specified at the top of your source file, and it decodes the bytes in the source file to a unicode object based on the encoding you specified.

在这两种情况下，您的编辑器都将一系列字节保存到文件中，对于 € 字符，UTF-8 编码为三个字节，以十六进制表示为E282AC.因此，字节串中的最后一个字节是 AC，或十进制的 172.将最后 3 个字节解码为 UTF-8 后，它们一起成为 Unicode 代码点 U+20AC，十进制为 8364.

In both cases, your editor saved a series of bytes to a file, for the € character, the UTF-8 encoding is three bytes, represented in hexadecimal as E282AC. The last byte in the bytestring is thus AC, or 172 in decimal. Once you decode the last 3 bytes as UTF-8, they together become the Unicode codepoint U+20AC, which is 8364 in decimal.

你真的应该阅读 Python 和 Unicode:

You really should read up on Python and Unicode:

Python Unicode HOWTO

Pragmatic Unicode 作者 Ned Batchelder

Pragmatic Unicode by Ned Batchelder

每个软件开发人员绝对、肯定必须了解 Unicode 和字符集的绝对最低要求(没有借口)!) 作者:乔尔·斯波尔斯基

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

这篇关于这些字符串如何在 Python 解释器内部表示?我不明白的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

这些字符串如何在 Python 解释器内部表示?我不明白 [英] How are these strings represented internally in Python interpreter ? I don't understand

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

这些字符串如何在 Python 解释器内部表示?我不明白 [英] How are these strings represented internally in Python interpreter ? I don&#39;t understand

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

这些字符串如何在 Python 解释器内部表示?我不明白 [英] How are these strings represented internally in Python interpreter ? I don't understand

登录关闭