使用特殊规则标记类型参数中的“"吗? [英] Are ">>"s in type parameters tokenized using a special rule?
问题描述
我对 Java规范感到困惑代码应标记为:
I'm confused by the Java spec about how this code should be tokenized:
ArrayList<ArrayList<Integer>> i;
规范说:
在每个步骤中都使用尽可能长的翻译,即使结果最终不能正确编写程序,而另一个词法翻译也会这样做.
The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would.
据我了解,应用最长匹配"规则将产生令牌:
As I understand it, applying the "longest match" rule would result in the tokens:
- ArrayList
- <
- ArrayList
- <
- 整数
- >>
- i
- ;
这将无法解析.但是当然可以对这段代码进行解析.
which would not parse. But of course this code is parsed just fine.
这种情况的正确规范是什么?
What is the correct specification for this case?
这是否意味着正确的词法分析器必须与上下文无关?使用常规词法分析器似乎不可能.
Does it mean that a correct lexer must be context-free? It doesn't seem possible with a regular lexer.
推荐答案
Based on reading the code linked by @sm4, it looks like the strategy is:
-
正常标记输入.因此
A<B<C>> i;
将被标记为A, <, B, <, C, >>, i, ;
-8个令牌,而不是9.
tokenize the input normally. So
A<B<C>> i;
would be tokenized asA, <, B, <, C, >>, i, ;
-- 8 tokens, not 9.
在层次分析期间,在解析泛型时需要使用>
,如果下一个标记以>
->>
,>>>
,>=
,>>=
开头,或>>>=
-只需敲击>
并将缩短的令牌推回令牌流即可.示例:当解析器在处理typeArguments规则时进入>>, i, ;
时,它成功解析了typeArguments,并且剩余的令牌流现在与>, i, ;
略有不同,因为>>
的第一个>
被拉到了匹配typeArguments.
during hierarchical parsing, when working on parsing generics and a >
is needed, if the next token starts with >
-- >>
, >>>
, >=
, >>=
, or >>>=
-- just knock the >
off and push a shortened token back onto the token stream. Example: when the parser gets to >>, i, ;
while working on the typeArguments rule, it successfully parses typeArguments, and the remaining token stream is now the slightly different >, i, ;
, since the first >
of >>
was pulled off to match typeArguments.
因此,尽管令牌化确实会正常发生,但如有必要,在层次分析阶段会进行一些重新令牌化.
So although tokenization does happen normally, some re-tokenization occurs in the hierarchical parsing phase, if necessary.
这篇关于使用特殊规则标记类型参数中的“"吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!