JavaScript中的自动分号插入,无需解析 [英] Automatic Semicolon Insertion in JavaScript without parsing

查看:123
本文介绍了JavaScript中的自动分号插入,无需解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个JavaScript预处理器,它会在必要的地方自动插入分号。不要问为什么。

I'm writing a JavaScript preprocessor which automatically inserts semicolons in places where it's necessary. Don't ask why.

现在我知道解决这个问题的一般方法是编写JavaScript解析器并根据规则。但是出于以下原因我不想这样做:

Now I know that the general way to tackle this problem is to write a JavaScript parser and add semicolons where necessary according to the rules in the specs. However I don't want to do so for the following reasons:


  1. 我不想写一个完整的解析器。

  2. 我想保留评论和空格。

我已经(正确地)实施了第二个使用简单的扫描仪进行自动分号插入的第三条规则。

I've already (correctly) implemented the second and third rule for automatic semicolon insertion using a simple scanner.

然而,第一条规则被证明更难以实施。所以我有三个问题:

The first rule however proves to be more of a challenge to implement. So I have three questions:


  1. 是否可以使用带有前瞻和外观的简单扫描仪来实现第一条规则?

  2. 如果可能,那么有人已经完成了吗?

  3. 如果没有,那么我应该如何解决这个问题呢?

为了完整起见,这里有三条规则:

For the sake of completeness here are the three rules:



  • 当从左到右解析程序时,会遇到任何语法生成不允许的令牌(称为违规令牌),然后会自动插入分号如果满足下列一个或多个条件,则违规令牌:

  • When, as the program is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:


  1. 违规令牌与前一个令牌分开至少一个 LineTerminator

违规令牌为}


  • 当程序从左到右解析时ht,遇到令牌输入流的末尾,并且解析器无法将输入令牌流解析为单个完整的ECMAScript Program ,然后在输入流的末尾自动插入分号。

  • When, as the program is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Program, then a semicolon is automatically inserted at the end of the input stream.

    当从左到右解析程序时,会遇到某些语法生成所允许的令牌,但是生产是限制生产,令牌将成为限制生产中注释[no LineTerminator 此处]之后的终端或非终端的第一个令牌(因此这样的令牌)被称为受限制的令牌),受限制的令牌通过至少一个 LineTerminator 与前一个令牌分开,然后在受限令牌之前自动插入分号。

    When, as the program is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation "[no LineTerminator here]" within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least one LineTerminator, then a semicolon is automatically inserted before the restricted token.

    但是,前面的规则还有一个额外的首要条件:分号永远不会被插入如果分号将被解析为空语句或者如果该分号将成为 for 语句的标题中的两个分号之一(部分 12.6.3

    However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the two semicolons in the header of a for statement (section 12.6.3).


    推荐答案

    单独使用扫描仪(tokenizer)无法实现您想要的功能。这是因为要回答我们这里需要分号吗?你需要回答下一个令牌是违规令牌吗?要回答这个问题,你需要一个JavaScript语法,因为违规令牌被定义为语法在这个地方不允许的东西。

    There is no way to achieve what you want with a scanner (tokenizer) alone. This is because to answer "do we need a semicolon here?" you need to answer "Is the next token an offending token?" and to answer this, you need a JavaScript grammar because an offending token is defined as something that the grammar doesn't allow at this place.

    我在创建方面取得了一些成功所有令牌的列表,然后在第二步中处理该列表(所以我会有一些上下文)。使用这种方法,你可以通过编写如下代码来修复一些地方:

    I had some success with creating a list of all tokens and then process that list in a second step (so I would have some context). Using this approach, you can fix some places by writing code like this:


    • 向后迭代令牌(从最后一个开始,继续朝向文件的开头)

    • 如果当前令牌是 IF FOR WHILE VAR 等:

      • 跳过空格令牌前的注释和

      • 如果当前令牌不是; ,则插入一个

      • Iterate over the tokens backwards (starting with the last one, going towards the start of the file)
      • If the current token is IF, FOR, WHILE, VAR etc:
        • Skip whitespace and comments before the token
        • If the current token is not ;, then insert one

        这种方法有效,因为错误不是随机的。人们总是犯同样的错误。大多数时候,人们在一行结束后忘记了; ,并在关键字出现前查找缺少的; 找到它们的好方法。

        This approach works because mistakes aren't random. People make always the same mistakes. Most of the time, people forget the ; after the end of a line and looking for missing ; before a keyword is a good way to locate them.

        但这种方法只能让你到目前为止。如果必须可靠地找到所有缺失的分号,则必须编写JavaScript解析器(或重用现有的分号)。

        But this approach will only ever get you so far. If you must find all missing semicolons reliably, you must write a JavaScript parser (or reuse an existing one).

        这篇关于JavaScript中的自动分号插入,无需解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆