在解析Javascript时,是什么决定了斜杠的含义? [英] When parsing Javascript, what determines the meaning of a slash?

查看:145
本文介绍了在解析Javascript时,是什么决定了斜杠的含义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Javascript有一个棘手的语法来解析。正斜杠可以表示许多不同的东西:除法运算符,正则表达式文本,注释引入者或行注释引入者。最后两个很容易区分:如果斜线后跟一个星号,则会启动多行注释。如果斜杠后面跟着另一个斜杠,那就是一个行注释。

Javascript has a tricky grammar to parse. Forward-slashes can mean a number of different things: division operator, regular expression literal, comment introducer, or line-comment introducer. The last two are easy to distinguish: if the slash is followed by a star, it starts a multiline comment. If the slash is followed by another slash, it is a line-comment.

但是消除除法和正则表达式字面值的规则正在逃避我。我无法在 ECMAScript标准中找到它。词汇语法明确分为两部分,InputElementDiv和InputElementRegExp,具体取决于斜杠的含义。但没有什么可以解释何时使用它。

But the rules for disambiguating division and regex literal are escaping me. I can't find it in the ECMAScript standard. There the lexical grammar is explicitly divided into two parts, InputElementDiv and InputElementRegExp, depending on what a slash will mean. But there's nothing explaining when to use which.

当然,可怕的分号插入规则使一切变得复杂。

And of course the dreaded semicolon insertion rules complicate everything.

有没有人有一个明确的代码来解决具有答案的lexing Javascript?

Does anyone have an example of clear code for lexing Javascript that has the answer?

推荐答案

它实际上相当容易,但它需要制作你的词法分析器比平时更聪明。

It's actually fairly easy, but it requires making your lexer a little smarter than usual.

除法运算符必须跟随表达式,正则表达式字面值不能跟随表达式,因此在所有其他情况下你可以安全地假设您正在查看正则表达式字面值。

The division operator must follow an expression, and a regular expression literal can't follow an expression, so in all other cases you can safely assume you're looking at a regular expression literal.

如果您正确执行,则必须将Punctuators识别为多字符字符串。所以看看前面的标记,看看它是否是以下任何标记:

You already have to identify Punctuators as multiple-character strings, if you're doing it right. So look at the previous token, and see if it's any of these:

. ( , { } [ ; , < > <= >= == != === !== + - * % ++ --
<< >> >>> & | ^ ! ~ && || ? : = += -= *= %= <<= >>= >>>=
&= |= ^= / /=

对于大多数这些,你现在知道你在上下文中你可以在哪里找到正则表达式文字。现在,在 ++ - 的情况下,你需要做一些额外的工作。如果 ++ - 是预增量/减量,然后 / 在它之后启动一个正则表达式文字;如果它是一个后递增/递减,那么它后面的 / 启动一个DivPunctuator。

For most of these, you now know you're in a context where you can find a regular expression literal. Now, in the case of ++ --, you'll need to do some extra work. If the ++ or -- is a pre-increment/decrement, then the / following it starts a regular expression literal; if it is a post-increment/decrement, then the / following it starts a DivPunctuator.

幸运的是,您可以通过检查以前的令牌来确定它是否是预先运算符。首先,后递增/递减是限制生产,因此如果 ++ - 前面有一个换行符,然后你知道它是pre-。否则,如果前一个令牌是任何事情at可以在正则表达式文字之前(yay recursion!),然后你知道它是pre-。在所有其他情况下,它是post - 。

Fortunately, you can determine whether it is a "pre-" operator by checking its previous token. First, post-increment/decrement is a restricted production, so if ++ or -- is preceded by a linebreak, then you know it is "pre-". Otherwise, if the previous token is any of the things that can precede a regular expression literal (yay recursion!), then you know it is "pre-". In all other cases, it is "post-".

当然,标点符号不总是表示表达式的结尾 - 例如 if(something)/regex/.exec(x)。这很棘手,因为它 需要一些语义理解才能解开。

Of course, the ) punctuator doesn't always indicate the end of an expression - for example if (something) /regex/.exec(x). This is tricky because it does require some semantic understanding to disentangle.

可悲的是,这并不是全部。有些运营商不是标点符号,还有其他值得注意的关键字。正则表达式文字也可以遵循这些。它们是:

Sadly, that's not quite all. There are some operators that are not Punctuators, and other notable keywords to boot. Regular expression literals can also follow these. They are:

new delete void typeof instanceof in do return case throw else

如果您刚刚使用的IdentifierName就是其中之一,那么您正在查看正则表达式文字;否则,它是一个DivPunctuator。

If the IdentifierName you just consumed is one of these, then you're looking at a regular expression literal; otherwise, it's a DivPunctuator.

以上是基于ECMAScript 5.1规范(如找到此处)并且不包括该语言的任何特定于浏览器的扩展。但是如果你需要支持这些,那么这应该提供简单的指导方针来确定你所处的上下文。

The above is based on the ECMAScript 5.1 specification (as found here) and does not include any browser-specific extensions to the language. But if you need to support those, then this should provide easy guidelines for determining which sort of context you're in.

当然,上述大多数代表非常愚蠢包含正则表达式文字的案例。例如,即使在语法允许的情况下,也无法实际预先增加正则表达式。因此,大多数工具都可以通过简化实际应用程序的正则表达式上下文检查来实现。 JSLint检查前面字符(,=:[!& |?{}; )的方法可能就足够了。但是如果你在开发什么时应该采用这样的捷径成为lexing JS的工具,那么你应该注意这一点。

Of course, most of the above represent very silly cases for including a regular expression literal. For example, you can't actually pre-increment a regular expression, even though it is syntactically allowed. So most tools can get away with simplifying the regular expression context checking for real-world applications. JSLint's method of checking the preceding character for (,=:[!&|?{}; is probably sufficient. But if you take such a shortcut when developing what's supposed to be a tool for lexing JS, then you should make sure to note that.

这篇关于在解析Javascript时,是什么决定了斜杠的含义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆