jFlex中的正则表达式,带有硬编码的异常 [英] Regex in jFlex with hardcoded exceptions

查看:175
本文介绍了jFlex中的正则表达式,带有硬编码的异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在jFlex中使用一个正则表达式来匹配包含一些字符的字符串文字,然后是一个连字符,再加上一个单词.但是,有一些硬编码的例外.我的jFlex版本是1.6.1

I need a regex in jFlex to match a string literal, containing some characters, followed by a hyphen which is followed by a word. However, there are a few hardcoded exceptions. My jFlex version is 1.6.1

我的正则表达式是:

SUFFIXES = labeled|deficient
ALPHANUMERIC = [:letter:]|[:digit:]
AVOID_SUFFIXES = {SUFFIXES} | !({ALPHANUMERIC}+)
WORD = ({ALPHANUMERIC}+([\-\/\.]!{AVOID_SUFFIXES})*)

应将

字符串"MXs12-labeled"标记为'MXs12', '-', 'labeled'(以后由不同的正则表达式捕获的连字符),并且将"MXs12-C123"标记为'MXs12-C123',因为C123不在后缀列表中.

String "MXs12-labeled" should be tokenized into 'MXs12', '-', 'labeled' (hyphen caught by different regex later), and "MXs12-C123" into 'MXs12-C123' as C123 is not on list of suffixes.

但是,我获得的令牌是"MXs12-labele"-比例外禁止的字母短一个字母.

However, the token I obtain is "MXs12-labele" - one letter short of the one forbidden by exception.

一个明显的解决方案是在正则表达式中包含其他非{ALPHANUMERIC}字符,但这也会将该字符添加到匹配项中.

An obvious solution would be including additional non {ALPHANUMERIC} character in the regex, but that would add this character to the match too.

另一种解决方案似乎是使用负前瞻,但是每次我尝试解析它们时,它们都会返回语法错误-jFlex似乎不支持它. ( Flex似乎不支持正则表达式前瞻断言(快速词法分析器))

Another solution seemed to be to use a negative lookahead, but they return a syntax error every time I try to parse them - jFlex seems not to supports it. (Flex seems do not support a regex lookahead assertion (the fast lex analyzer))

有人知道如何在jFlex中解决这个问题吗?

Does anyone know how to solve this in jFlex?

推荐答案

如您所见,使用正匹配比使用负匹配要容易得多. (很明显,labelelabeled不匹配,而且它是labeled的最长前缀,与labeled不匹配,因此,如果您尝试匹配一个词!labeled,则很合乎逻辑,会得到labele作为匹配项.

As you've observed, it's much easier to work with positive matches than with negative matches. (Clearly, labele does not match labeled, and furthermore it's the longest prefix of labeled which doesn't match labeled, so it's logical that if you try to match a word which is !labeled, you'll get labele as a match.

JFlex不会实现否定的超前断言,它们虽然稍有不同,但仍然存在问题.否定的前瞻性断言肯定会拒绝MXs12-labeled中的后缀,但是也会拒绝MXs12-labeledblack中的后缀,我认为这有点令人惊讶.

JFlex does not implement negative lookahead assertions, which are slightly different but still problematic. A negative lookahead assertion would certainly reject the suffix in MXs12-labeled, but it would also reject the suffix in MXs12-labeledblack, which would be a bit surprising, I think.

但是,如果用积极的比赛来改写它,那真的很简单.这个想法是指定每次正面比赛需要做什么.在这种情况下,我们要使用-labeled的正匹配项将其放回到输入流中,这可以通过yypushback完成.这将建议规则如下:

If you rephrase this with positive matches, though, it's really simple. The idea is to specify what needs to be done with every positive match. In this case, what we'll want to do with the positive match of -labeled is to put it back into the input stream, which can be done with yypushback. That would suggest rules something like this:

{ALPHANUMERIC}+ ({DELIMITER}{ALPHANUMERIC}+)* "-labeled"  { yypushback(8); /* return the WORD */ }
{ALPHANUMERIC}+ ({DELIMITER}{ALPHANUMERIC}+)* "-deficient"  { yypushback(10); return /* return the WORD */ }
{ALPHANUMERIC}+ ({DELIMITER}{ALPHANUMERIC}+)* { return /* return the WORD */ }

请注意,顺序很重要,因为该序列依赖于具有比最后一个模式更高的优先级的前两个模式. (与前两个模式之一匹配的输入也将与最后一个模式匹配,但按指示顺序的规则将不会赢取最后一个模式.)

Note that order is important, since the sequence relies on the first two patterns having higher precedence than the last pattern. (Inputs which match one of the first two patterns will also match the last pattern, but with the rules in the order indicated the last pattern will not win.)

这可能是您真正想要的,也可能不是.它会按照您的问题所示处理MXs12-labeledMXs12-C123. MXs12-labeledblackMXs12-labeled-black都将被报告为单个令牌;我完全不清楚您对这些投入的期望是什么.

That might or might not be what you really want. It will handle MXs12-labeled and MXs12-C123 as indicated in your question. MXs12-labeledblack and MXs12-labeled-black will both be reported as single tokens; it's not at all clear to me what your expectations are on these inputs.

这篇关于jFlex中的正则表达式,带有硬编码的异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆