Flex RegEx无法匹配? [英] Flex RegEx not getting matched?

查看:66
本文介绍了Flex RegEx无法匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经与Flex/Bison一起工作了大约6个小时,这是我无法解决的第一个问题:

I've been working with Flex/Bison for about 6 hours now, and here is the first problem I don't seam to be able to solve:

我有以下文件...

 state state1: {
     1-3: 255
     4: 255
 }

...我使用cat和|传递给我的flex/bison程序.flex文件包含以下行:

...which I pass to my flex/bison program using cat and |. The flex file contains this line:

\bstate\b  { return STATE; }

并进一步介绍以下内容:

and further down this one:

.*         { fprintf(stderr, "Lexer error on line %d: \"%s\"\n", linenum, yytext); exit(-1); }

人们应该认为\ bstate \ b应该在文件中匹配,但事实并非如此.相反,我得到以下输出:

One should think that \bstate\b should get matched in the file, but it doesn't. Instead I get the following output:

"exer error on line 1: "state state1: {

这在很多方面都很奇怪.首先,Lexer中的L接缝已被替换为,但更重要的是,状态没有匹配.为什么?

This is strange in several ways. Firstly, the L in Lexer seams to have been replaced by an ", but more importantly, state didn't get matched. Why???

当然\ bstate \ b在.*之前,并且它们在右部分.

Of course the \bstate\b is before the .*, and they are in the right section.

感谢您的帮助,扬

推荐答案

(F)Lex不会在输入中搜索匹配项.它会尝试在当前输入位置的所有样式 ,并选择与最多文本匹配的模式,或者选择最早的模式(如果有多个匹配相同的文本).下一场lex比赛将从上一场比赛结束的地方开始.

(F)Lex does not search the input for a match. It tries all the patterns at the current input position, and selects the one which matches the most text, or the earliest one if more than one matches the same amount of text. The next lex match will start where the previous one ended.

.* 与该行的其余部分匹配. \ bstate \ b 只能匹配七个字符.因此.* 将获胜.但是 \ bstate \ b 实际上不匹配,因为这是lex,而不是<在此处插入您喜欢的regex语法>和 \ b 表示退格键,就像在C程序中一样.

.* matches the rest of the line. \bstate\b would only match seven characters. So .* would win. But \bstate\b does not actually match because this is lex, not <insert your favourite regex syntax here> and \b means backspace, like it would in a C program.

字母L用引号覆盖的原因可能是您的输入文件是在Windows上创建的,并且在行尾具有\ r \ n..* 将匹配包含 \ r 的一个,这是一个回车符.因此,当您打印f %s" \ n 时,替换%s的字符串中的最后一个字符为回车符,这将导致光标移至当前行中的第一个位置到那个时间点有一个L.然后在L的顶部打印",最后打印换行符,开始换行.

The reason the letter L is overwritten with a quote is probably that your input file was created on Windows and has \r\n at the end of lines. .* will match up to an including the \r, which is a carriage return. So when you printf "%s"\n, the last character in the string which replaces %s is a carriage return, which causes the cursor to move to the first spot in the current line, which up to that point in time had an L in it. Then the " is printed over top of the L, and then finally you print the newline character, which starts a new line.

没有Lex等效于单词边界断言 \ b ,但这很少出现问题.几乎所有编程语言的词法扫描器都必须解决保留字也将与标识符模式匹配的问题.但是,最长匹配规则和首个匹配规则的组合使执行此操作变得容易.简而言之,始终将保留字词模式放在首位.例如:

There is no Lex equivalent to the word-boundary assertion \b but that's very rarely a problem. Lexical scanners for practically all programming languages have to cope with the issue that reserved words will also match the pattern for identifiers; however, the combination of the longest-match and first-match rules makes it easy to do this. Put simply, always put reserved word patterns first. For example:

do              { return DO; }
double          { return DOUBLE; }
if              { return IF; }
/* ... */
[a-z][a-z0-9]*  { return ID; }

在上面的示例中,放置 do double 的顺序无关紧要,因为 double 更长,但是我总是感觉您应该按字母顺序排列保留词以保持整洁.但重要的是,ID模式要走到最后,因为它也匹配所有保留字.

The order in which you put do and double doesn't matter in the above example, because double is longer, but I always feel like you should put reserved words in alphabetical order for tidiness. But it is important that the ID pattern go last, because it also matches all of the reserved words.

现在考虑对以保留字开头的标识符(例如 dog )进行词法化时会发生什么.在这种情况下,DO模式和ID模式都将匹配,但是ID匹配时间更长,因此尽管获胜了,但还是赢了.

Now consider what happens when lexing an identifier that starts with a reserved word, like dog. In this case, the DO pattern and the ID pattern will both match, but the ID match is longer so it wins, despite being later.

这篇关于Flex RegEx无法匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆