如何在Javascript中实现词法分析 [英] How to implement Lexical Analysis in Javascript

查看:51
本文介绍了如何在Javascript中实现词法分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,感谢您阅读

我目前正在尝试使用Google风格的计算器.您输入一个字符串,它确定是否可以计算该字符串并返回结果.

I am currently attempting to do a Google-style calculator. You input a string, it determines if it can be calculated and returns the result.

我从基础知识开始: +-/* 和括号处理.

I began slowly with the basics : + - / * and parenthesis handling.

随着时间的推移,我愿意改进计算器,并且不久前我已经学习了一些词法分析功能,我建立了一个标记和相关的正则表达式模式的列表.

I am willing to improve the calculator over time, and having learned a bit about lexical analysis a while ago, I built a list of tokens and associated regular expression patterns.

除了我正在开发仅使用Javascript的应用程序外,这种工作很容易适用于Lex和Yacc等语言.

This kind of work is easily applicable with languages such as Lex and Yacc, except I am developping a Javascript-only application.

我试图将这个想法转换成Javascript,但是我不知道如何以一种干净漂亮的方式来处理所有事情,尤其是嵌套的括号.

I tried to transcript the idea into Javascript but I can't figure out how to handle everything in a clean and beautiful way, especially nested parenthesis.

让我们定义什么是计算器查询:

Let's define what a calculator query is:

// NON TERMINAL EXPRESSIONS //
query     -> statement
query     -> ε // means end of query

statement -> statement operator statement
statement -> ( statement )
statement -> prefix statement
statement -> number

number    -> integer
number    -> float

// TERMINAL EXPRESSIONS //
operator  -> [+*/%^-]

prefix    -> -

integer   -> [0-9]+

float     -> [0-9]+[.,][0-9]+


Javascript

Lexical Analysis包含验证没有什么看起来不像终端表达式之一:运算符,前缀,整数和浮点数.可以将其缩短为一个正则表达式:


Javascript

Lexical Analysis consists in verifying there is nothing that doesn't look like one of the terminal expressions : operator, prefixes, integer and float. Which can be shortened to one regular expression:

(我添加了空格使其更具可读性)

(I added spaces to make it more readable)

var calcPat = 
/^ (\s*
    ( ([+/*%^-]) | ([0-9]+) | ([0-9]+[.,][0-9]+) | (\() | (\)) )
)+ \s* $/;

如果该测试通过,则查询在词法上是正确的,需要进行语法检查以确定是否可以计算出来.这是棘手的部分

If this test passes, query is lexically correct and needs to be grammar-checked to determine if it can be calculated. This is the tricky part

我不会粘贴代码,因为它不干净也不易于理解,但是我将解释我遵循的过程以及为什么会卡住:

I am not going to paste code because it is not clean nor easily understandable, but I am going to explain the process I followed and why I'm stuck:

我创建了一个名为 isStatement(string)的方法,该方法应该递归调用自身.主要思想是将字符串拆分为潜在"语句,并检查它们是否真的是语句并完全形成一个语句.
过程如下:

I created a method called isStatement(string) that's supposed to call itself recursively. The main idea is to split the string into 'potential' statements and check if they really are statements and form one altogether.
Process is the following:

-如果前两个标记是一个数字,后跟一个运算符:

-If the first two tokens are a number followed by an operator:

-然后,
-如果剩余的只是一个令牌并且是数字:
---这是一条语句.
---否则,检查其余标记是否构成语句(递归调用)

-Then,
-- If the remaining is just one token and it is a number:
--- Then this is a statement.
--- Else, check if the remaining tokens form a statement (recursive call)

-否则,如果第一个标记是括号
-然后,找到匹配的右括号并检查里面是否有一个语句(递归)
-还要在关闭括号后检查是否有东西,并且在与括号结构关联时是否形成语句.

-Else, If the first token is a parenthesis
-Then, Find matching closing parenthesis and check if what's inside is a statement (recursion)
-- Also check if there is something after closing parenthesis and if it forms a statement when associated with the parenthesis structure.

我的问题是,当存在嵌套结构时,找不到匹配的括号.我该怎么做?另外,如您所见,这不是一种特别通用且简洁的语法检查算法.您是否有任何想法可以改善这种模式?

My problem is that I cannot find matching parenthesis when there is nested structures. How can I do that ? Also, as you can see, this is not a particurlarly generic and clean grammar-checking algorithm. Do you have any idea to improve this pattern ?

非常感谢您抽出宝贵的时间阅读所有内容.盖尔

Thank you so much for having taken the time to read everything. Gael

(PS:您可能已经注意到,我不是英语为母语的人!很抱歉犯了所有错误!)

(PS: As you probably noticed, I am not a native english speaker ! Sorry for mistakes and all !)

推荐答案

您对什么是词法分析有了正确的认识,但是您似乎对令牌语法之间的区别感到困惑>和语言语法.那是两件事.

You've got the right idea about what lexical analysis is, but you seem to have gotten confused about the distinction between the token grammar and the language grammar. Those are two different things.

  • 令牌语法是一组模式(通常为正则表达式),用于描述要解析的语言的令牌.正则表达式是字符集上的表达式.

  • The token grammar is the set of patterns (usually regular expressions) that describe the tokens for the language to be parsed. The regular expressions are expressions over a character set.

语言语法(或者我想是 target 语法)是您要解析的语言的语法.这种语法是用记号表示的.

The language grammar (or target grammar, I suppose) is the grammar for the language you want to parse. This grammar is expressed in terms of tokens.

您不能编写正则表达式来解析代数表示法.您不能.您可以 为此编写语法,但这不是常规语法.您想要做的是识别单独的标记,在您的情况下,可以使用类似于您所拥有的正则表达式来完成.诀窍是您并没有真正将表达式应用于要分析的整个句子.相反,您想在句子的当前点匹配一个标记.

You cannot write a regular expression to parse algebraic notation. You just can't. You can write a grammar for it, but it's not a regular grammar. What you want to do is recognize separate tokens, which in your case could be done with a regular expression somewhat like what you've got. The trick is that you're not really applying that expression to the overall sentence to be parsed. Instead, you want to match a token at the current point in the sentence.

现在,由于您可以使用Javascript正则表达式,因此您可以 提供一个旨在匹配令牌字符串的正则表达式.技巧将是找到一种方法,以识别出可能性列表中匹配的令牌.Javascript regex引擎可以为您提供分组的数组,因此也许您可以在此基础上构建一些东西.

Now, because you've got Javascript regular expressions to work with, you could come up with a regular expression designed to match a string of tokens. The trick with that will be coming up with a way to identify which token was matched out of the list of possibilities. The Javascript regex engine can give you back arrays of groups, so maybe you could build something on top of that.

edit —我正在尝试从一个单独的正则表达式列表(每个标记一个)开始,如何组合一个(某种程度上)通用标记生成器.它可能不是很复杂,并且很有趣.

edit — I'm trying to work out how you could put together a (somewhat) general-purpose tokenizer builder, starting from a list of separate regular expressions (one for each token). It's possibly not very complicated, and it'd be pretty fun to have around.

这篇关于如何在Javascript中实现词法分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆