从释放的Flex /野牛()的strdup中分配的字符串 [英] freeing the string allocated in strdup() from flex/bison

查看:233
本文介绍了从释放的Flex /野牛()的strdup中分配的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有柔性code那份串词位使用的strdup()

 %{
#包括json.tab.h
#定义YY_DECL为externCINT函数yylex()%}
%选择noyywrap%%[\\ t \\ n] +;
\\[A-ZA-Z] + \\{= yylval.sval的strdup(yytext中);返回的字符串; }
[0-9] + {yylval.ival =的atoi(yytext中);返回数; }
。 {返回yytext中[0];};%%

的strdup()分配内存和复制输入字符串,并返回(的的strdup() - ?是什么在C做),所以我想我需要释放它的时候我不需要它了。

从这个帖子

当析构函数%BISON中调用,我加了%的析构函数{免费($$);的printf(自由);} STRING 在YACC文件

不过,我没有看到免费()即使 yylval.sval 分配与被调用从返回新的字符串的strdup()

什么可能是错误的?如何免费的Flex /野牛分配的字符串?

添加

我考虑使用静态分配SVAL如下:

 %{工会
    INT IVAL;
    焦炭SVAL [100]; //字符* SVAL;
}

柔性code现在变成(不检查code如果yytext中小于100字节):

  \\[A-ZA-Z] + \\{
    //yylval.sval =的strdup(yytext中);
    memset的(yylval.sval,0,100);
    的strcpy(yylval.sval,yytext中);
    返回的字符串;
}

我不知道,如果这种做法是人们正常使用。

Added2

有关我的应用程序,简单的实习是确定。

 的extern字符缓冲区[]; // [100];
%}
%选择noyywrap%%\\[A-ZA-Z] + \\{
        //yylval.sval =的strdup(yytext中);
        memset的(缓冲液,0,100);
        的strcpy(缓冲区,yytext中);
        yylval.sval =缓冲;
        返回的字符串;
    }
...炭缓冲器[100];

有关YACC code

 %{工会
    INT IVAL;
    字符* SVAL;
}


解决方案

正如你所说,你需要释放字符串时,我不需要它了。这是简单(或复杂)作为。

C没有一个垃圾收集器,和C程序员,因此何时不再需要分配的内存知道负责。这种语言并不试图弄明白,以及(大部分)同样没有野牛。

如果您有其设置有包含指向分配的内存的一个或多个语义值下降规则,该规则可能会做任何的一些事情。它可能通过语义值到新的语义值,通常通过仅复制指针。它可能复制语义值,然后释放原始。它可能语义值添加到解析全局数据结构,就像一个符号表。

在所有这些情况下,程序员应该知道的分配的内存是否仍需要,并应拨打免费分配的,如果它不是。

不过,也有少数病例中,没有它以往任何时候都psented来还原作用$ P $野牛将放弃一个语义值。大多数的这些错误条件。如果因为错误恢复的一部分,野牛决定放弃一个​​令牌,该令牌的语义值可能会泄漏内存。它是precisely这种情况下野牛有一个%析声明。如果(且仅当)野牛丢弃令牌错误恢复或错误后清理结果的%析 code被调用。所有其他情况都是你的责任。

想逃避通过栈槽巨大(如包括的char [100] 中的语义值工会)这个责任既是不安全和低效率。这是不安全的,因为你需要不断地意识到,固定空间可能导致缓冲区溢出,这意味着解析语法上有效的程序可能会覆盖任意内存。因为你最终使堆栈幅度超过必要大几个数量级这是低效的;也因为你最终不断复制栈槽(至少两次的每个的归约规则,甚至是使用的默认操作的人。)

搞清楚一个语义值的一生,如果你打算共享内存只有复杂。这不是对字符串通常为有用(如你的例子),但它可以为变量名相当有帮助;大多数名称发生比在程序一次,所以总有使用相同的文字串在每次出现的诱惑。

我通常实习中词法分析器字符串解决标识符问题。词法分析器保持解析全局名称表 - 比如,一个简单的设置用哈希表来实现 - 对于遇到的每个标识符,它增加了标识符到名称表,并传递唯一的名称项指针的语义值。在解析结束后某一点上,将整个名称表可以被释放,释放所有的标识符

有关字符串和其他可能唯一的字符串,你既可以使用名称表,无论如何,或者你可以避开曾经有一个指针的两份相同的字符串。使用名称表具有降低需要在存储器管理做的工作量的优点,但在可能保持周围额外时间不必要串的成本。这在很大程度上取决于解析结果的性质:如果是AST,那么你可能需要作为AST的存在是为了保持字符串一样长,但如果你正在做的直接执行或者一个通code代,你可能不需要在长距离的字符串。

I have flex code that copies a string lexeme using strdup().

%{   
#include "json.tab.h"
#define YY_DECL extern "C" int yylex()

%}
%option noyywrap

%%

[ \t\n]+ ; 
\"[a-zA-Z]+\" {yylval.sval = strdup(yytext); return STRING; }
[0-9]+ {yylval.ival = atoi(yytext); return NUMBER; }
. {return yytext[0];} ; 

%%

strdup() allocates memory and copies the input string into it and return (strdup() - what does it do in C?), so I guess I need to free it up when I don't need it anymore.

From this post:When is %destructor invoked in BISON?, I added %destructor { free($$); printf("free");} STRING in the yacc file.

However, I don't see the free() is invoked even when yylval.sval is assigned with new string returned from strdup().

What might be wrong? How to free the allocated string in flex/bison?

Added

I think about using statically allocated sval as follows:

%union {
    int ival;
    char sval[100]; // char* sval;
}

The flex code now becomes (without the checking code if yytext is less than 100 bytes):

\"[a-zA-Z]+\" {
    //yylval.sval = strdup(yytext);
    memset(yylval.sval, 0, 100);
    strcpy(yylval.sval, yytext);
    return STRING; 
}

I'm not sure if this approach is what people normally use.

Added2

For my application, simple interning is OK.

extern char buffer[]; // [100];
%}
%option noyywrap

%%

\"[a-zA-Z]+\" {
        //yylval.sval = strdup(yytext);
        memset(buffer, 0, 100);
        strcpy(buffer, yytext);
        yylval.sval = buffer;
        return STRING; 
    }
...

char buffer[100];

For yacc code

%union {
    int ival;
    char *sval; 
}

解决方案

As you say, you need to free the string "when I don't need it anymore." It's as simple (or complicated) as that.

C does not have a garbage collector, and C programmers are therefore responsible for knowing when allocated memory is no longer needed. The language does not attempt to figure it out, and (mostly) neither does bison.

If you have a reduction rule which is provided with one or more semantic values containing pointers to allocated memory, that rule might do any of a number of things. It might pass the semantic values into the new semantic value, typically by copying only the pointer. It might copy the semantic value, and then free the original. It might add the semantic value to a parse-global datastructure, like a symbol table.

In all of those cases, the programmer should be aware of whether or not the allocated memory is still required, and should call free the allocation if it is not.

However, there are a few cases in which bison will discard a semantic value without it ever being presented to a reduction action. Most of these are error conditions. If as part of error recovery, bison decides to discard a token, that token's semantic value could leak memory. And it is precisely for this case that bison has a %destructor declaration. The %destructor code is called if (and only if) bison discards the token as a result of error recovery or post-error clean-up. All other cases are your responsibility.

Trying to evade this responsibility by making stack slots enormous (such as including a char[100] in the semantic value union) is both unsafe and inefficient. It's unsafe because you need to be constantly aware that the fixed space buffer could overflow, meaning that parsing a syntactically valid program might overwrite arbitrary memory. It's inefficient because you end up making the stack several orders of magnitude larger than necessary; and also because you end up constantly copying the stack slots (at least twice for every reduction rule, even the ones which use the default action.)

Figuring out the lifetime of a semantic value is only complicated if you intend to share memory. That's not usually useful for string literals (as in your example) but it can be quite helpful for variable names; most names occur more than once in a program, so there is always the temptation to use the same character string for each occurrence.

I usually solve the identifier problem by "interning" the string in the lexer. The lexer maintains a parse-global name table -- say, a simple set implemented with a hash-table -- and for each identifier it encounters, it adds the identifier to the name table and passes the unique name entry pointer as the semantic value. At some point after the end of the parse, the entire name table can be freed, freeing all the identifiers.

For string literals and other probably-unique strings, you could either use the name table anyway, or you could avoid ever having two copies of a pointer to the same character string. Using the name table has the advantage of reducing the amount of work you need to do in memory management, but at the cost of possibly keeping unnecessary strings around for extra time. That depends a lot on the nature of the parse result: if it is an AST, then you probably need to keep the character strings as long as the AST exists, but if you are doing direct execution or one-pass code generation, you might not need the string literals in the long haul.

这篇关于从释放的Flex /野牛()的strdup中分配的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆