标识符中的Unicode下标和上标，为什么Python会考虑XU ==Xᵘ==Xᵤ? [英] Unicode subscripts and superscripts in identifiers, why does Python consider XU == Xᵘ == Xᵤ?

查看：115 发布时间：2020/11/26 3:08:57 python unicode syntax identifier

本文介绍了标识符中的Unicode下标和上标，为什么Python会考虑XU ==Xᵘ==Xᵤ?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Python允许使用unicode标识符.我定义了Xᵘ = 42，期望XU和Xᵤ会导致NameError.但是实际上，当我定义Xᵘ时，Python(默默地?)将Xᵘ变成了Xu，这使我感到有些不可思议.为什么会这样?

Python allows unicode identifiers. I defined Xᵘ = 42, expecting XU and Xᵤ to result in a NameError. But in reality, when I define Xᵘ, Python (silently?) turns Xᵘ into Xu, which strikes me as somewhat of an unpythonic thing to do. Why is this happening?

>>> Xᵘ = 42
>>> print((Xu, Xᵘ, Xᵤ))
(42, 42, 42)

推荐答案

Python将所有标识符转换为它们的 NFKC规范形式;来自参考文档的 标识符部分 :

Python converts all identifiers to their NFKC normal form; from the Identifiers section of the reference documentation:

所有标识符在解析时都转换为普通形式的NFKC；标识符的比较是基于NFKC.

All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.

上标和下标字符的NFKC形式均为小写u:

The NFKC form of both the super and subscript characters is the lowercase u:

>>> import unicodedata
>>> unicodedata.normalize('NFKC', 'Xᵘ Xᵤ')
'Xu Xu'

所以最后，您只有一个标识符Xu:

So in the end, all you have is a single identifier, Xu:

>>> import dis
>>> dis.dis(compile('Xᵘ = 42\nprint((Xu, Xᵘ, Xᵤ))', '', 'exec'))
  1           0 LOAD_CONST               0 (42)
              2 STORE_NAME               0 (Xu)

  2           4 LOAD_NAME                1 (print)
              6 LOAD_NAME                0 (Xu)
              8 LOAD_NAME                0 (Xu)
             10 LOAD_NAME                0 (Xu)
             12 BUILD_TUPLE              3
             14 CALL_FUNCTION            1
             16 POP_TOP
             18 LOAD_CONST               1 (None)
             20 RETURN_VALUE

上面对已编译字节码的反汇编表明，标识符在编译期间已被规范化；这是在解析过程中发生的，在创建编译器用来生成字节码的AST(抽象解析树)时，所有标识符都将被规范化.

The above disassembly of the compiled bytecode shows that the identifiers have been normalised during compilation; this happens during parsing, any identifiers are normalised when creating the AST (Abstract Parse Tree) which the compiler uses to produce bytecode.

对标识符进行了规范化处理，以避免出现许多潜在的外观相似"错误，否则您可能会同时使用两个ﬁnd()(使用 U + FB01拉丁文小连字FI 字符，后跟ASCII码nd字符)和find()，想知道您的代码为什么有错误.

Identifiers are normalized to avoid many potential 'look-alike' bugs, where you'd otherwise could end up using both ﬁnd() (using the U+FB01 LATIN SMALL LIGATURE FI character followed by the ASCII nd characters) and find() and wonder why your code has a bug.

这篇关于标识符中的Unicode下标和上标，为什么Python会考虑XU ==Xᵘ==Xᵤ?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

标识符中的Unicode下标和上标，为什么Python会考虑XU ==Xᵘ==Xᵤ? [英] Unicode subscripts and superscripts in identifiers, why does Python consider XU == Xᵘ == Xᵤ?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

标识符中的Unicode下标和上标，为什么Python会考虑XU ==Xᵘ==Xᵤ? [英] Unicode subscripts and superscripts in identifiers, why does Python consider XU == Xᵘ == Xᵤ?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭