ECMAScript:词汇语法与句法语法 [英] ECMAScript: Lexical Grammar vs Syntactic Grammar

查看:94
本文介绍了ECMAScript:词汇语法与句法语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

5.1.2词法和RegExp语法

第11节给出了ECMAScript的词汇语法. 具有符合以下条件的终端符号Unicode代码点: 10.1中定义的SourceCharacter规则.它定义了一组 产品,从目标符号InputElementDiv开始, InputElementTemplateTail或InputElementRegExp或 InputElementRegExpOrTemplateTail,描述了这样的序列 代码点被翻译成一系列输入元素.

A lexical grammar for ECMAScript is given in clause 11. This grammar has as its terminal symbols Unicode code points that conform to the rules for SourceCharacter defined in 10.1. It defines a set of productions, starting from the goal symbol InputElementDiv, InputElementTemplateTail, or InputElementRegExp, or InputElementRegExpOrTemplateTail, that describe how sequences of such code points are translated into a sequence of input elements.

除了空格和注释之外的其他输入元素构成了终端 ECMAScript语法语法的符号,称为 ECMAScript令牌.这些标记是保留字,标识符, 文字和ECMAScript语言的标点符号.

Input elements other than white space and comments form the terminal symbols for the syntactic grammar for ECMAScript and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and punctuators of the ECMAScript language.

5.1.4语法语法

将代码点流解析为ECMAScript脚本时 或模块,首先将其转换为输入元素流 重复运用词汇语法;此输入流 然后通过语法的单个应用程序来解析元素 语法.

When a stream of code points is to be parsed as an ECMAScript Script or Module, it is first converted to a stream of input elements by repeated application of the lexical grammar; this stream of input elements is then parsed by a single application of the syntactic grammar.


问题

  1. 词汇语法
    • 这里说终端符号是Unicode代码点(单个字符)
    • 它还说它产生输入元素(又名令牌)
    • 这些如何对帐?终端符号都是令牌,因此会产生令牌.或者,终端符号是单独的代码点,这就是它产生的结果.
  1. Lexical grammar
    • Here it says the terminal symbols are Unicode code points (individual characters)
    • It also says it produces input elements (aka. tokens)
    • How are these reconcilable? Either the terminal symbols are tokens, and thus it produces tokens. Or, the terminal symbols are individual code points, and that's what it produces.
  • 我对这个语法和词汇语法有相同的疑问
  • 似乎这里的终端符号是令牌
  • 因此,通过应用语法语法规则,将产生有效的令牌,这些令牌又可以发送到解析器?或者,此语法是否接受标记作为输入,然后测试标记的整体流的有效性?


我的最佳猜想

  1. Lexing阶段
    • 输入:代码点(源代码)
    • 输出:应用词法语法产生来产生有效的标记(词素类型+值)作为输出
  1. Lexing phase
    • Input: Code points (source code)
    • Output: Applies lexical grammar productions to produce valid tokens (lexeme type + value) as output
  • 输入:令牌
  • 输出:应用语法语法产生式(CFG)来确定所有标记是否一起代表有效流(即,源代码整体上是有效的Script/Module)
  • Input: Tokens
  • Output: Applies syntactic grammar productions (CFG) to decide if all the tokens together represent a valid stream (i.e. that the source code as a whole is a valid Script / Module)

推荐答案

我认为您对终端符号表示.实际上,它们是解析器的 input ,而不是输出(它是解析树-包括列表的简写情况).

I think you are confused about what terminal symbol means. In fact they are the inputs of the parser, not the outputs (which is a parse tree - including the degenerate case of a list).

另一方面,生产规则的确确实有终端符号作为输出,而目标符号确实是输入-它是向后的,这就是术语终端"的来源.可以将非终结符扩展(以不同的方式,这就是规则描述的内容)为一系列终结符.

On the other hand, a production rule does have indeed terminal symbols as the output and a goal symbol as the input - it's backwards, that's where the term "terminal" comes from. A non-terminal can be expanded (in different ways, that's what the rules describe) to a sequence of terminal symbols.

示例:

Language:
   S -> T | S '_' T
   T -> D | T D
   D -> '0' | '1' | '2' | … | '9'

String:
   12_45

Production:
     S          // start: the goal
   = S '_' T
   = T '_' T
   = T D ' ' T
   = T '2 ' T
   = D '2 ' T
   = '12 ' T
   = '12 ' T D
   = '12 ' T '5'
   = '12 ' D '5'
   = '12_45'     // end: the terminals

Parse tree:
   S
    S
     T
      T
       D
        '1'
      D
       '2'
    ' '
    T
     T
      D
       '4'
     D
      '5'

Parser output (generating a sequence of items from top-level Ts):
   '12'
   '45'

所以

  • 词法处理阶段将代码点作为输入,并将标记作为输出.代码点是词汇语法的结尾符号.
  • 句法阶段将令牌作为输入,将程序作为输出.记号是句法语法的结尾符号.

这篇关于ECMAScript:词汇语法与句法语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆