Peg.js引擎在进行正则表达式前瞻后会退步吗? [英] Does the Peg.js engine backstep after a lookahead like regexs do?

查看:213
本文介绍了Peg.js引擎在进行正则表达式前瞻后会退步吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据regular-expressions.info的环顾四周,引擎在进行前瞻后退了一步:

According to regular-expressions.info on lookarounds, the engine backsteps after a lookahead:

让我们再看一次内部,以确保您了解 前瞻的含义.让我们应用q(?= u)i退出.这 提前查找现在是正数,其后是另一个标记.同样,q 匹配q和u匹配u.同样,前瞻的匹配必须是 丢弃,因此引擎从字符串中的i返回到u.这 前瞻成功,因此引擎继续执行i.但是我不能 匹配你因此,此匹配尝试失败.其余所有尝试均失败,原因是 好吧,因为字符串中不再有q.

Let's take one more look inside, to make sure you understand the implications of the lookahead. Let's apply q(?=u)i to quit. The lookahead is now positive and is followed by another token. Again, q matches q and u matches u. Again, the match from the lookahead must be discarded, so the engine steps back from i in the string to u. The lookahead was successful, so the engine continues with i. But i cannot match u. So this match attempt fails. All remaining attempts fail as well, because there are no more q's in the string.

但是,在Peg.js中, SEEMS 就像引擎仍然通过&!进行移动一样,因此实际上它不是与正则表达式相同的前瞻性,而是决定消费,没有退步,因此没有真正的前瞻.

However, in Peg.js it SEEMS like the engine still moves passed the & or ! so that in fact it isn't a lookahead in the same sense as regexps but a decision on consumption, and there is no backstepping, and therefor no true looking ahead.

是这种情况吗?

(如果是这样的话,那么某些聚会甚至是不可能的,例如这一个?)

(If so then certain parsearen't even possible, like this one?)

推荐答案

Lookahead的工作方式与在正则表达式引擎中的工作方式类似.

Lookahead works similar to how it does in a regex engine.

此查询无法匹配,因为下一个字母应为'u',而不是'i'.

This query fails to match because the next letter should be 'u', not 'i'.

word = 'q' &'u' 'i' 't'

此查询成功:

word = 'q' &'u' 'u' 'i' 't'

此查询成功:

word = 'q' 'u' 'i' 't'

以您的示例为例,按照以下方式尝试操作,根本不需要使用先行提示:

As for your example, try something along these lines, you shouldn't need to use lookaheads at all:

expression
    = termPair ( _ delimiter _ termPair )*

termPair
    = term ('.' term)? ' ' term ('.' term)?

term "term"
    = $([a-z0-9]+)

delimiter "delimiter"
    = "."

_ "whitespace"
    = [ \t\n\r]+

编辑:在下面的每个注释中添加了另一个示例.

EDIT: Added another example per comments below.

expression
    = first:term rest:delimTerm* { return [first].concat(rest); }

delimTerm
    = delimiter t:term { return t; }

term "term"
    = $((!delimiter [a-z0-9. ])+)

delimiter "delimiter"
    = _ "." _

_ "whitespace"
    = [ \t\n\r]+

编辑:添加了对术语表达的额外说明.

EDIT: Added extra explanation of the term expression.

我将尝试细分术语规则$((!delimiter [a-z0-9. ])+).

I'll try to break down the term rule a bit $((!delimiter [a-z0-9. ])+).

$()将内部的所有内容转换为单个文本节点,例如[].join('').

$() converts everything inside to a single text node like [].join('').

术语的单个字符"是任何字符[a-z0-9. ],如果我们想简化它,我们可以改为使用..在匹配字符之前,我们要先查找delimiter,如果找到delimiter,我们将停止匹配该字符.由于我们需要多个字符,因此我们可以使用+多次执行整个操作.

A single "character" of a term is any character [a-z0-9. ], if we wanted to simplify it, we could say . instead. Before matching the character we want to lookahead for a delimiter, if we find a delimiter we stop matching that character. Since we want multiple characters we do the whole thing multiple times with +.

以这种方式前进是PEG解析器中的一个常见习语.我从 treetop 文档中了解了匹配字符串的想法.

It think it's a common idiom in PEG parsers to move forward this way. I learned the idea from the treetop documentation for matching a string.

这篇关于Peg.js引擎在进行正则表达式前瞻后会退步吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆