使用Bison递归规则的麻烦,并使用它来存储值 [英] Troubles using Bison's recursive rules, and storing values using it

查看:58
本文介绍了使用Bison递归规则的麻烦,并使用它来存储值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为Newick文件格式树制作一个flex + bison扫描器和解析器,以便对其进行操作.解释的实现语法基于简化((标签和长度始终是相同类型,由flex返回))此示例.

I am trying to make a flex+bison scanner and parser for Newick file format trees in order to do operations on them. The implemented grammar an explanation is based on a simplification of (labels and lengths are always of the same type, returned by flex) this example.

从本质上讲,这是一种文件格式的解析器,该解析器表示具有一系列(递归)子树和/或叶子的树.主树将始终以; 结尾,并且所述树和其中的所有子树将在()之间包含一系列节点,并命名为以及由 name :length 指定的最右括号右边的长度,它们是可选的(您可以避免指定它们,将其中之一( name:length ),或两者都带有 name:length ).

This is esentially a parser for a file format which represents a tree with a series of (recursive) subtrees and/or leaves. The main tree will always end on ; and said tree and all subtrees within will contain a series of nodes between ( and ), with a name and a length to the right of the rightmost parenthesis specified by name and :length, which are optional (you can avoid specifying them, put one of them (name or :length), or both with name:length).

如果任何节点缺少名称或长度,则将应用默认值.(例如:'missingName''1')

If any node lacks either the name or a length, default values will be applied. (for example: 'missingName' and '1')

一个例子是(child1:4,child2:6)root:6; ((child1Of1:2,child2Of1:9)child1:5,child2:6)root:6;

所述语法的实现如下(注意:我翻译了自己的代码,就像使用我的语言一样,为了清晰起见,删除了许多附带的内容):

The implementation of said grammar is the following one (NOTE: I translated my own code, as it was in my language, and lots of side stuff got removed for clarity):

    struct node {
        char* name; /*the node's assigned name, either from the file or from default values*/
        float length; /*node's length*/
    } dataOfNode;
}

 

%start tree
 
%token<dataOfNode> OP CP COMMA SEMICOLON COLON DISTANCE NAME

%type<dataOfNode> tree subtrees recursive_subtrees subtree leaf


%%

tree:   subtrees NAME COLON DISTANCE SEMICOLON          {} // with name and distance    
        | subtrees NAME SEMICOLON                       {} // without distance
        | subtrees COLON DISTANCE SEMICOLON             {} // without name
        | subtrees SEMICOLON                            {} // without name nor distance
        ;
    
    
subtrees:   OP recursive_subtrees CP                    {}
            ;   
        
            
recursive_subtrees: subtree                             {} // just one subtree, or the last one of the list         
                    | recursive_subtrees COMMA subtree  {} // (subtree, subtree, subtree...)        
    

subtree:    subtrees NAME COLON DISTANCE    {   $$.NAME= $2.name; $$.length = $4.length; $$.lengthAcum = $$.lengthAcum + $4.length;
                                                } // group of subtrees, same as the main tree but without ";" at the end, with name and distance                                        
            
            | subtrees NAME                 {   $$.name= $2.name; $$.length = 1.0;}             // without distance                                 
            
            | subtrees COLON DISTANCE       {   $$.name= "missingName"; $$.length = $3.length;} // without name                             
           
            | subtrees                      {   $$.name= "missingName"; $$.length = 1.0;}       // without name nor distance                            
            
            | leaf                          {   $$.name= $1.name; $$.length = $1.length;}       // a leaf
                    
                    
                    
leaf:   NAME COLON DISTANCE {   $$.name= $$.name; $$.length = $3.length;}       // with name and distance
        | NAME              {   $$.name= $1.name; $$.length = 1.0;}             // without distance
        | COLON DISTANCE    {   $$.name= "missingName"; $$.length = $2.length;} // without name
        |                   {   $$.name= "missingName"; $$.length = 1.0;}       // without name nor distance
        ;


%%

现在,让我们说我想区分谁是每个子树和叶子的父级,以便我可以将父子树的长度与最长"子级的长度进行累加.孩子,递归地.

Now, let's say that I want to distinguish who is the parent of each subtree and leaf, so that I can accumulate the length of a parent subtree with the length of the "longest" child, recursively.

我不知道我是否为此选择了错误的设计,但我无法摆脱为每个子树(和叶子,也被认为是子树)分配名称和长度,以及我也不了解递归如何在匹配过程中识别父母的方式.

I do not know if I chose a bad design for this, but I can't get past assigning names and lengths to each subtree (and leaf, which is also considered a subtree), and I don't think I understand either how recursivity works in order to identify the parents in the matching process.

推荐答案

这主要是定义要保留树的数据结构,然后将其自底向上"构建的问题.在规则的行动中.自下而上"这部分是对北美野牛解析器工作方式的重要暗示-它们是自下而上"的,从语法的叶子中识别出结构,然后将它们组装成更高的非末端(最终变成起始的非末端),将是最后一次执行操作).您也可以通过没有太多冗余规则来简化事情.最后,IMO最好将字符文字用于单个字符标记,而不是名称.因此,您可能最终得到:

This is mostly a matter of defining the data structure you want to hold your trees, and building that "bottom up" in the actions of the rules. The "bottom up" part is an important implication of the way that bison parsers work -- they are "bottom up", recognizing constructs from the leaves of the grammar and then assembling them into higher non-terminals (and ulitimately into the start non-terminal, which will be the last action run). You can also simplify things by not having so many redundant rules. Finally, IMO it's always better to use character literals for single character tokens rather than names. So you might end up with:

%{
struct tree {
    struct tree  *next;     /* each tree is potentially part of a list of (sub)trees */
    struct tree  *subtree;  /* and may contain a subtress */
    const char   *name;
    double       length;
};

struct tree *new_leaf(const char *name, double length);   /* malloc a new leaf "tree" */
void append_tree(struct tree **list, struct tree *t);  /* append a tree on to a list of trees */
%}

%union {
    const char   *name;
    double       value;
    struct tree  *tree;
}

%type<tree> subtrees recursive_subtrees subtree leaf
%token<name> NAME
%token<value> DISTANCE

%%

tree: subtrees leaf ';'  { $2->subtree = $1;  print_tree($2); } 
    ;
    
subtrees: '(' recursive_subtrees ')'  { $$ = $2; }
        ;   
            
recursive_subtrees: subtree                         { $$ = $1; } // just one subtree, or the last one of the list
                  | recursive_subtrees ',' subtree  { append_tree(&($$=$1)->next, $3); } // (subtree, subtree, subtree...)
                  ;

subtree: subtrees leaf  { ($$=$2)->subtree = $1; }
       | leaf           { $$ = $1; }
       ;
                    
leaf: NAME ':' DISTANCE { $$ = new_leaf($1, $3);}              // with name and distance
    | NAME              { $$ = new_leaf($1, 1.0);}             // without distance
    | ':' DISTANCE      { $$ = new_leaf("missingName", $2; }   // without name
    |                   { $$ = new_leaf("missingName", 1.0); } // without name nor distance
    ;

%%

struct tree *new_leaf(const char *name, double length) {
    struct tree *rv = malloc(sizeof(struct tree));
    rv->subtree = rv->next = NULL;
    rv->name = name;
    rv->length = length;
    return rv;
}

void append_tree(struct tree **list, struct tree *t) {
    assert(t->next == NULL);  // must not be part of a list yet
    while (*list) list = &(*list)->next;
    *list = t;
}

这篇关于使用Bison递归规则的麻烦,并使用它来存储值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆