带转义字符的Flex/Lex编码字符串 [英] Flex / Lex Encoding Strings with Escaped Characters

查看:136
本文介绍了带转义字符的Flex/Lex编码字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在某些背景下,我将参考这个问题:

I'll refer to this question for some of the background:

flex/lex中字符串文字的正则表达式

我遇到的问题是在词法分析器中处理带有转义字符的输入,我认为这可能与字符串的编码有关,但我不确定.

The problem I am having is handling the input with escaped characters in my lexer and I think it may be an issue to do with the encoding of the string, but I'm not sure.

这是我在词法分析器中处理字符串文字的方式:

Here's is how I am handling string literals in my lexer:

\"(\\.|[^\\"])*\"
{                   
    char* text1 = strndup(yytext + 1, strlen(yytext) - 2);
    char* text2 = "text\n";

    printf("value = <%s> <%x>\n", text1, text1);
    printf("value = <%s> <%x>\n", text2, text2);
}

这将输出以下内容:

value = <text\n"> <15a1bb0>
value = <text
> <7ac871>

似乎将换行符视为反斜杠,后跟n.

It appears to be treating the newline character separately as a backslash followed by an n.

这是怎么回事,如何处理与C输入相同的文本?

What's going on here, how do I process the text to be identical to the C input?

推荐答案

您的正则表达式仅与字符串\转义符匹配-实际上不会将它们转换为它们所代表的字符.我更喜欢使用flex起始状态和可累积字符的字符串构建缓冲区来处理此类事情.像这样:

Your regexp just matches string \ escapes -- it doesn't actually translate them into the characters that they represent. I prefer to handle this sort of thing with a flex start state and string building buffer that can accumulate characters. Something like:

%{
static StringBuffer strbuf;
%}
%x string
%%

\"                  { BEGIN string; ClearBuffer(strbuf); }
<string>[^\\"\n]*   { AppendBufferString(strbuf, yytext); }
<string>\\n         { AppendBufferChar(strbuf, '\n'); }
<string>\\t         { AppendBufferChar(strbuf, '\t'); }
<string>\\[0-7]*    { AppendBufferChar(strbuf, strtol(yytext+1, 0, 8)); }
<string>\\[\\"]     { AppendBufferChar(strbuf, yytext[1]); }
<string>\"          { yylval.str = strdup(BufferData(strbuf)); BEGIN 0; return STRING; }
<string>\\.         { error("bogus escape '%s' in string\n", yytext); }
<string>\n          { error("newline in string\n"); }

这使发生的事情更加清晰,可以轻松地为新的转义添加新的转义处理代码,并可以在出现问题时轻松地发出清晰的错误消息.

This makes what is going on much clearer, makes it easy to add new escape processing code for new escapes, and makes it easy to issue clear error messages when something goes wrong.

这篇关于带转义字符的Flex/Lex编码字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆