为什么此yacc + lex基本解析器不处理CONTROL + D/EOF? [英] Why this yacc+lex basic parser does not handle CONTROL+D / EOF?

查看:91
本文介绍了为什么此yacc + lex基本解析器不处理CONTROL + D/EOF?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个yacc/lex程序来处理这种行(在此示例中,它仅处理一种格式,但想法是显然它将处理更多格式):

I have a yacc/lex program to handle this kind of lines (in this example it just handles one format, but the idea is that it will obviously handle more formats):

% cat test.csv 
20191201 170003296,1.102290,1.102470,0
20191201 170004413,1.102320,1.102470,0
20191201 170005270,1.102290,1.102470,0
20191201 170006063,1.102280,1.102460,0
20191201 170006629,1.102260,1.102440,0
20191201 170007523,1.102410,1.102470,0
20191201 170007573,1.102410,1.102530,0
20191201 170035268,1.102490,1.102530,0
20191201 170036505,1.102490,1.102540,0
20191201 170043219,1.102490,1.102530,0

词法词法分析器(lexer.l):

%{

#include <time.h>
#include "grammar.h"

void read_float_number(void);
void read_integer_number(void);
void read_date_YYYYMMDD_HHMMSSmmm(void);
void yyerror(const char* msg);

%}

%%    
                                                                                                /* YYYYMMDD HHMMSSmmm DATE */
[12][09][0-9][0-9][0-1][0-9][0-3][0-9][ ][0-2][0-9][0-5][0-9][0-5][0-9][0-9][0-9][0-9]          { read_date_YYYYMMDD_HHMMSSmmm(); return DATETIME; }

                                                                                                /* FLOAT NUMBER */
[0-9]+\.[0-9]+                                                                                  { read_float_number(); return FLOAT_NUMBER; }

                                                                                                /* INTEGER NUMBER */
[0-9]+                                                                                          { read_integer_number(); return INTEGER_NUMBER; }

                                                                                                /* PASS ',' CHARACTER */
,                                                                                               { return ','; } 

                                                                                                /* PASS '\n' CHARACTER */
\n                                                                                              { return '\n'; } 

                                                                                                /* PASS UNEXPECTED CHARACTER */
.                                                                                               { return yytext[0]; }


%%

/* READ FLOAT NUMBER */
void read_float_number(void) {
        printf("void read_float_number(void)\n");
        printf("#%s#\n", yytext);
        sscanf(yytext, "%lf", &yylval.float_number);
        printf("%lf\n", yylval.float_number);
}

/* READ INTEGER NUMBER */
void read_integer_number(void) {
        printf("void read_integer_number(void)\n");
        printf("#%s#\n", yytext);
        sscanf(yytext, "%ld", &yylval.integer_number);
        printf("%ld\n", yylval.integer_number);
}

/* READ YYYYMMDD HHMMSSmmm DATE */
void read_date_YYYYMMDD_HHMMSSmmm(void) {

        printf("void read_date_YYYYMMDD_HHMMSSmmm(void)\n");
        printf("#%s#\n", yytext);

        /*  DATETIME STRUCT TM */
        struct tm dt;

        /* READ VALUES */
        sscanf(yytext, "%4d%2d%2d %2d%2d%2d", &dt.tm_year, &dt.tm_mon, &dt.tm_mday, &dt.tm_hour, &dt.tm_min, &dt.tm_sec);

        /* NORMALIZE VALUES */
        dt.tm_year = dt.tm_year - 1900;         /* NORMALIZE YEAR */
        dt.tm_mon = dt.tm_mon - 1;              /* NORMALIZE MONTH */
        dt.tm_isdst = -1;                       /* NO INFORMATION ABOUT DST */
        mktime(&dt);                            /* NORMALIZE STRUCT TM */

        /* PRINT DATE TIME */
        char buffer[80];
        strftime(buffer, 80, "%c %Z", &dt);
        printf("%s\n", buffer);

        /* COPY STRUCT TM TO YACC RETURN VALUE */
        memcpy(&yylval.datetime, &dt, sizeof(dt));


}

yacc语法(grammar.y):

%{

#include <time.h>
#include <stdio.h>

%}

%union {

        struct tm       datetime;               /* DATE TIME VALUES */
        double          float_number;           /* 8 BYTES DOUBLE VALUE */
        long            integer_number;         /* 8 BYTES INTEGER VALUE */

}

%token  <datetime>              DATETIME
%token  <float_number>          FLOAT_NUMBER
%token  <integer_number>        INTEGER_NUMBER

%%

input:                          /* empty */
                        | input lastbid_lastask

lastbid_lastask:        DATETIME ',' FLOAT_NUMBER ',' FLOAT_NUMBER ',' INTEGER_NUMBER '\n'      { printf("MATCH %lf %lf %ld\n", $3, $5, $7); }
                        ;

%%

extern FILE *yyin;

int main(int argc, char *argv[]) {

        while(!feof(yyin)) {
                yyparse();
        }
        return 0;

}

makefile:

% cat makefile 
CCFLAGS = -std=c89 -c
YFLAGS = -d     # Forces generation of y.tab.h
OBJS = lexer.o grammar.o
TARGET = readfile

readfile:               $(OBJS)
                        cc $(OBJS) -std=c89 -ll -o $(TARGET)

grammar.h grammar.o:    grammar.y
                        yacc $(YFLAGS) -ogrammar.c grammar.y
                        cc $(CCFLAGS) grammar.c

lexer.o:                lexer.l grammar.h
                        lex -olexer.c lexer.l
                        cc $(CCFLAGS) lexer.c

clean:
                        rm -f $(OBJS) grammar.[ch] lexer.c

现在我编译程序了,没有错误,但是当我尝试执行它时,我得到了:

Now I compile the program and there are no errors, but when I try to execute it I get this:

% cat test.csv | ./readfile
Segmentation fault (core dumped)

现在,如果我更换:

while(!feof(yyin)) 

具有:

while(1) 

然后我得到了:

% cat test.csv | ./readfile
void read_date_YYYYMMDD_HHMMSSmmm(void)
#20191201 170003296#
Sun Dec  1 17:00:03 2019 CET
void read_float_number(void)
#1.102290#
1.102290
void read_float_number(void)
#1.102470#
1.102470
void read_integer_number(void)
#0#
0
MATCH 1.102290 1.102470 0
void read_date_YYYYMMDD_HHMMSSmmm(void)
#20191201 170004413#
Sun Dec  1 17:00:04 2019 CET
void read_float_number(void)
#1.102320#
1.102320
void read_float_number(void)
#1.102470#
1.102470
void read_integer_number(void)
#0#
0
...

因此可以正常工作,但程序不会以EOF结尾.虽然我知道核心转储可能意味着很多事情,但我应该怎么做才能进一步找出问题并获得正常的行为?

So it works, but the program does not end with the EOF. While I know a core dump can mean many things, what could I do to further locate the issue and get a normal behaviour?

推荐答案

不要循环调用 yyparse().它将解析整个输入并返回;当它返回时,您知道整个输入已被解析(或遇到语法错误).不需要进行任何EOF测试.

Don't call yyparse() in a loop. It will parse the entire input and return; when it returns you know the entire input has been parsed (or a syntax error was encountered). There should be no need for any EOF testing.

(在某些情况下,您需要打破此规则,其中大多数情况与扫描器返回的输入指示符(而不是输入的末尾)有关,或者与使用 YYACCEPT/的解析器有关.YYABORT 以便过早终止解析.换句话说,如果您有需要违反此规则的情况,您已经知道必须这样做.)

(There are isolated cases in which you need to break this rule, most of which have to do with either the scanner returning end of input indicators other than at the end of the input, or the parser using YYACCEPT/YYABORT in orderly to prematurely terminate the parse. In other words, if you have a case where you need to break this rule, you already knew you would have to do that.)

while(!feof(file)){…} 具有完整的

while (!feof(file)) {…} has a whole FAQ entry explaining why it's almost always a bug. (Summary: the EOF flag is set after a read detects EOF, so the fact that EOF is not set before you do the read proves nothing. The while(!feof(file)) idiom pretty well guarantees that at the end of the file you'll get an unexpected EOF -- unexpected in the sense of "But I just checked for EOF...".)

不过,我认为FAQ并不涵盖此特定问题,该问题特定于使用(f)lex的程序.当(f)lex扫描程序到达文件末尾时,它将 yyin 设置为NULL.然后,如果 yywrap 告诉它没有更多输入,则 yylex 返回0,这告诉其调用方( yyparse )文件的结尾达到了.然后 yyparse 完成解析并返回.如果随后循环,则 yyin 为NULL,而 feof(NULL)为未定义的行为.这就是您的程序出现段错误的原因.

I don't think the FAQ covers this particular issue, though, which is specific to programs using (f)lex. When a (f)lex scanner hits the end of file, it sets yyin to NULL. Then, if yywrap tells it that there is no more input, yylex returns 0, which tells its caller (yyparse) that the end of file was reached. Then yyparse finishes the parse and returns. If you then loop, yyin is NULL, and feof(NULL) is Undefined Behaviour. That's why your program segfaulted.

当删除 feof 测试(但仍然循环)时,您重新输入 yyparse ,但是这次将 yyin 设置为NULL .Flex扫描器将其表示为使用默认输入",即 stdin .如果 yyin 以前是某个输入文件,则意味着 yyparse 的新调用将尝试从终端获取其输入,这可能不是您期望的.另一方面,如果是 stdin 到达了EOF,则您将处于一个硬循环中,不断从 stdin 接收新的EOF信号.

When you remove the feof test (but still loop), you reenter yyparse, but this time with yyin set to NULL. The flex scanner takes that to mean "use the default input", i.e. stdin. If yyin was previously some input file, that means that the new invocation of yyparse will try to get its input from the terminal, which is probably not what you expected. On the other hand, if it was stdin which reached EOF, then you'll just be in a hard loop, continuously receiving new EOF signals from stdin.

这篇关于为什么此yacc + lex基本解析器不处理CONTROL + D/EOF?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆