如何在JavaCC中实现JavaScript自动分号插入? [英] How to implement JavaScript automatic semicolon insertion in JavaCC?

查看:164
本文介绍了如何在JavaCC中实现JavaScript自动分号插入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在完成我的 ECMAScript 5.1 / JavaScript JavaCC的语法。我已经按照规范完成了所有的令牌和制作。



现在我面临一个我不知道如何解决的大问题。 >

JavaScript具有自动分号插入的 nice 功能:


JavaScript自动分号插入的规则( ASI)?


To 引用规格,规则如下:


分号插入有三条基本规则:


  1. 当程序从左到右解析时, (称为违规令牌)遇到任何
    不允许的语法生成,那么分号是自动的如果一个或多个以下条件
    为true,则在违规令牌之前插入




    • 违规令牌为与以前的令牌分开,
      至少有一个LineTerminator。

    • 违规令牌是}


  2. 当从左到右分析程序时,会遇到令牌输入流的结尾,解析器无法
    将输入令牌流解析为单个完整的ECMAScript程序,
    ,则分号在输入
    流的末尾自动插入。


  3. 当程序从左到右解析时,遇到一些生成语法允许的令牌,但是
    的生产是受限生产,令牌将是第一个$在
    注释之后的终端或非终端的b $ b标记 [no LineTerminator here] 在restri cted production
    (因此这样的令牌被称为限制令牌),并且
    限制令牌与先前的令牌分开至少一个
    LineTerminator ,那么在
    限制令牌之前会自动插入一个分号。


在前面的规则中有一个额外的覆盖条件
:如果
,则分号不会自动插入,那么分号将被解析为空语句,或者如果
分号将成为两个一个
的标题中的分号用于声明(见 12.6.3 )。


我如何使用JavaCC实现?



到目前为止,我发现的一个答案关闭的东西是这个语法来自Dojo工具包,它有一个 JAVACODE 部分称为 insertSemiColon 致力于任务。但是我并没有看到这个方法在任何地方都被调用(既不是语法也不是整个 jslinker 代码)。



如何使用JavaCC处理此问题?



还有这个问题:


javascript语法和自动插入结语


(没有回答)



来自评论的一个问题:


说分号只需要插入哪里是正确的分号在语法上允许?


我认为分号需要在语法上被要求的分号才能被插入是正确的



这里的相关部分是§7.9:


7.9自动分号插入



某些ECMAScript语句(空语句,变量语句,表达式语句,do-while
语句,继续语句,break语句,return语句和
throw语句)必须以分号终止。这样的分号
可能始终显示在源文本中。然而,为了方便起见,
,某些情况下可能会从
中的源文本中省略这样的分号。在这些情况下,这些情况描述为
分号自动插入到源代码令牌
流中。


我们以返回语句为例:

  ReturnStatement:
返回;
return [no LineTerminator here]表达式;

所以(从我的理解)语法分号是需要,不只是允许(如你的问题)

解决方案

分号插入可以在 ECMAScript 5.1第7.9.1节中找到标准



我认为标准中的规则1和2可以用语义前瞻来处理。

  void PossiblyInsertedSemicolon()
{}
{
LOOKAHEAD({semicolonNeedsInserting()}){}
|
;
}

所以分号需要插入什么时候?当其中一个是真的




  • 当下一个标记不是分号并且位于另一行时( getToken 1).kind!= SEMICOLON&& getToken(0).endLine< getToken(1).beginLine

  • 当下一个标记为

  • 当下一个令牌是EOF



所以我们需要

  boolean semicolonNeedsInserting(){
return(`getToken(1).kind!= SEMICOLON&&&&GetToken(0)。 endLine< getToken(1).beginLine`)
|| getToken(1).kind == RBRACE
|| getToken(1).kind == EOF;
}

照顾标准的规则1和2。



对于规则3(限制制作),如我在,您可以执行以下

 

code> void returnStatement()
{}
{
return
[//解析一个表达式,除非下一个标记是;,} 或EOF,或下一个令牌在另一行。
LOOKAHEAD({getToken(1).kind!= SEMICOLON
&&& getToken(1).kind!= RBRACE
&& getToken(1).kind!= EOF
&& getToken(0).endLine == getToken(1).beginLine})
Expression()
]
PossiblyInsertedSemicolon()
}


I am finishing my ECMAScript 5.1/JavaScript grammar for JavaCC. I've done all the tokens and productions according to the specification.

Now I'm facing a big question which I don't know how to solve.

JavaScript has this nice feature of the automatic semicolon insertion:

What are the rules for JavaScript's automatic semicolon insertion (ASI)?

To quote the specifications, the rules are:

There are three basic rules of semicolon insertion:

  1. When, as the program is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:

    • The offending token is separated from the previous token by at least one LineTerminator.
    • The offending token is }.
  2. When, as the program is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Program, then a semicolon is automatically inserted at the end of the input stream.

  3. When, as the program is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation [no LineTerminator here] within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least one LineTerminator, then a semicolon is automatically inserted before the restricted token.

However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the two semicolons in the header of a for statement (see 12.6.3).

How could I implement this with JavaCC?

The closes thing to an answer I've found so far is this grammar from Dojo toolkit which has a JAVACODE part called insertSemiColon dedicated to the task. But I don't see that this method is called anywhere (neither in the grammar nor in the whole jslinker code).

How could I approach this problem with JavaCC?

See also this question:

javascript grammar and automatic semocolon insertion

(No answer there.)

A question from the comments:

Is it correct to say that semicolons need only be inserted where semicolons are syntactically allowed?

I think it would be correct to say that semicolons need only be inserted where semicolons are syntactically required.

The relevant part here is §7.9:

7.9 Automatic Semicolon Insertion

Certain ECMAScript statements (empty statement, variable statement, expression statement, do-while statement, continue statement, break statement, return statement, and throw statement) must be terminated with semicolons. Such semicolons may always appear explicitly in the source text. For convenience, however, such semicolons may be omitted from the source text in certain situations. These situations are described by saying that semicolons are automatically inserted into the source code token stream in those situations.

Let's take the return statement for instance:

ReturnStatement :
    return ;
    return [no LineTerminator here] Expression ;

So (from my understanding) syntactically the semicolon is required, not just allowed (as in your question).

解决方案

The 3 rules for semicolon insertion can be found in section 7.9.1 of the ECMAScript 5.1 standard

I think rules 1 and 2 from the standard can be handled with semantic lookahead.

void PossiblyInsertedSemicolon() 
{}
{
    LOOKAHEAD( {semicolonNeedsInserting()} ) {}
|
    ";"
}

So when does a semicolon need inserting? When one of these is true

  • When the next token is not a semicolon and is on another line (getToken(1).kind != SEMICOLON && getToken(0).endLine < getToken(1).beginLine)
  • When the next token is a right brace.
  • When the next token is EOF

So we need

boolean semicolonNeedsInserting() {
    return (`getToken(1).kind != SEMICOLON && getToken(0).endLine < getToken(1).beginLine`) 
    || getToken(1).kind == RBRACE
    || getToken(1).kind == EOF ;
}

That takes care of rules 1 and 2 of the standard.

For rule 3 (restricted productions) , as mentioned in my answer to this question, you could do the following

void returnStatement()
{}
{
    "return"
    [   // Parse an expression unless either the next token is a ";", "}" or EOF, or the next token is on another line.
        LOOKAHEAD( {   getToken(1).kind != SEMICOLON
                    && getToken(1).kind != RBRACE
                    && getToken(1).kind != EOF
                    && getToken(0).endLine == getToken(1).beginLine} )
        Expression()
    ]
    PossiblyInsertedSemicolon() 
}

这篇关于如何在JavaCC中实现JavaScript自动分号插入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆