int文字的属性访问 [英] Attribute access on int literals
问题描述
>>> 1 .__hash__()
1
>>> 1.__hash__()
File "<stdin>", line 1
1.__hash__()
^
SyntaxError: invalid syntax
这里已经涵盖了第二个示例不起作用的原因,因为int文字实际上被解析为浮点数.
It has been covered here before that the second example doesn't work because the int literal is actually parsed as a float.
我的问题是,当解释为浮点数是语法错误时,为什么不 python将其解析为int上的属性访问? 词汇分析上的文档部分似乎仅建议使用空格其他解释不明确时需要,但也许我在本节中读错了.
My question is, why doesn't python parse this as attribute access on an int, when the interpretation as a float is a syntax error? The docs section on lexical analysis seem to suggest whitespace only required when other interpretations are ambiguous, but perhaps I'm reading this section wrong.
直觉上,词法分析器似乎很贪婪(试图获得最大的代币),但是我没有这个说法的来源.
On a hunch it seems like the lexer is greedy (trying to take the biggest token possible), but I have no source for this claim.
推荐答案
仔细阅读,说
仅当两个令牌之间的级联可以解释为一个不同的令牌(例如,ab是一个令牌,而b是两个令牌)时,才需要在两个令牌之间使用空格.
Whitespace is needed between two tokens only if their concatenation could otherwise be interpreted as a different token (e.g., ab is one token, but a b is two tokens).
1.__hash__()
被标记为:
import io, tokenize
for token in tokenize.tokenize(io.BytesIO(b"1.__hash__()").read):
print(token.string)
#>>> utf-8
#>>> 1.
#>>> __hash__
#>>> (
#>>> )
#>>>
Python的词法分析器将选择一个标记,该标记包含可能从左到右读取时形成合法令牌;解析后,不能将两个令牌合并为有效令牌.逻辑与您在其他问题中的逻辑非常相似.
Python's lexer will choose a token which comprises the longest possible string that forms a legal token, when read from left to right; after parsing no two tokens should be able to be combined into a valid token. The logic is very similar to that in your other question.
这种困惑似乎并未将标记化步骤识别为一个完全不同的步骤.如果语法仅允许分割令牌以使解析器满意,那么您肯定会期望
The confusion seems to be not recognizing the tokenizing step as a completely distinct step. If the grammar allowed splitting up tokens solely to make the parser happy then surely you'd expect
_ or1.
标记为
_
or
1.
但是没有这样的规则,因此标记为
but there is no such rule, so it tokenizes as
_
or1
.
这篇关于int文字的属性访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!