再次使用flex + bison解析bibtex: [英] parse bibtex with flex+bison: revisited

查看:98
本文介绍了再次使用flex + bison解析bibtex:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近几周,我正在尝试为bibtex编写解析器( http://www.bibtex. org/Format/)文件,使用flex和bison.

For last few weeks, I am trying to write a parser for bibtex (http://www.bibtex.org/Format/) file using flex and bison.

$ cat raw.l
%{
#include "raw.tab.h" 
%}
value [\"\{][a-zA-Z0-9 .\t\{\} \"\\]*[\"\}]
%%
[a-zA-Z]*               return(KEY);
\"                          return(QUOTE);
\{                          return(OBRACE);
\}                          return(EBRACE);
;                           return(SEMICOLON);
[ \t]+                  /* ignore whitespace */;
{value}     {
    yylval.sval = malloc(strlen(yytext));
    strncpy(yylval.sval, yytext, strlen(yytext));
    return(VALUE);
}

$ cat raw.y
%{
#include <stdio.h>
%}

//Symbols.
%union
{
 char *sval;
};
%token <sval> VALUE
%token KEY
%token OBRACE
%token EBRACE
%token QUOTE
%token SEMICOLON 

%start Entry
%%

Entry:
     '@'KEY OBRACE VALUE ',' 
     KeyVal
     EBRACE
     ;

KeyVal:
      /* empty */
      | KeyVal '=' VALUE ','
      | KeyVal '=' VALUE 
      ;
%%

int yyerror(char *s) {
  printf("yyerror : %s\n",s);
}

int main(void) {
  yyparse();

}

%% 一个示例bibtex是:

%% A sample bibtex is:

@Book{a1,
    author = "a {\"m}ook, Rudra Banerjee",
    Title="ASR",
    Publisher="oxf",
    Year="2010",
    Add="UK",
    Edition="1",
}
@Article{a2,
    Author="Rudra Banerjee",
    Title="Fe{\"Ni}Mo",
    Publisher={P{\"R}B},
    Issue="12",
    Page="36690",
    Year="2011",
    Add="UK",
    Edition="1",
}

当我尝试解析它时,它给出了语法错误.使用GDB,它表明它期望KEY中的字段可能被声明(可能),

When I am trying to parse it, its giving syntax error. with GDB, it shows it expect fields in KEY to be declared(probably),

Reading symbols from /home/rudra/Programs/lex/Parsing/a.out...done.
(gdb) Undefined command: "".  Try "help".
(gdb) Undefined command: "Author".  Try "help".
(gdb) Undefined command: "Editor".  Try "help".
(gdb) Undefined command: "Title".  Try "help".
.....

如果有人在此方面帮助我,我将不胜感激.

I will be grateful if someone kindly help me on this.

推荐答案

很多问题.首先,您的词法分析器很困惑,试图将带引号的字符串和大括号的内容识别为单个VALUE,并试图识别单个字符,例如"{.对于引号,让词法分析器识别整个字符串是有意义的,但是对于要解析的结构化内容(如括号列表),您需要返回单个标记以供解析器进行解析.其次,在为字符串分配空间时,您不是在为NUL终结符分配空间.最后,您的语法看起来很奇怪,想要将诸如= VALUE = VALUE之类的内容解析为KeyValue,它与bibtex文件中的任何内容都不对应.

Lots of problems. First, your lexer is confused, trying to recognize quoted strings and braced things as a single VALUE as well as trying to recognize single characters like " and {. For quotes, it makes sense to have the lexer recognize the whole string, but for structural things that you want to parse (like braced lists), you need to return single tokens for the parser to parse. Second, when allocating space for a string, you aren't allocating space for a NUL-terminiator. Finally, your grammar looks odd, wanting parse things like = VALUE = VALUE as a KeyValue, which doesn't correspond to anything in a bibtex file.

首先,对于词法分析器.您想识别带引号的字符串和标识符,但其他内容应为单个字符:

So first, for the lexer. You want to recognize quoted strings and identifiers, but other things should be single characters:

[A-Za-z][A-Za-z0-9]*      { yylval.sval = strdup(yytext); return KEY; }
\"([^"\]|\\.)*\"          { yylval.sval = strdup(yytext); return VALUE; }
[ \t\n]                   ; /* ignore whitespace */
[{}@=,]                   { return *yytext; }
.                         { fprintf(stderr, "Unrecognized character %c in input\n", *yytext); }

现在您需要一个解析器来输入:

Now you need a parser for the entries:

Input: /* empty */ | Input Entry ;  /* input is zero or more entires */
Entry: '@' KEY '{' KEY ',' KeyVals '}' ;
KeyVals: /* empty */ | KeyVals KeyVal ; /* zero or more keyvals */
KeyVal: KEY '=' VALUE ',' ;

那应该解析您给出的示例.

That should parse the example you give.

这篇关于再次使用flex + bison解析bibtex:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆