是否从函数未定义的行为返回取消引用的指针作为引用? [英] Is returning a dereferenced pointer as a reference from a function undefined behavior?

查看:63
本文介绍了是否从函数未定义的行为返回取消引用的指针作为引用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是第一次写解析器.我正在关注关于Pratt解析器的本教程.我已经准备好了,但是我遇到了一个问题.

I'm writing a parser for the first time. I'm following this tutorial on Pratt parers. I've got it to work, but I've come up with sort of a problem.

原始教程是用Java编写的.我更喜欢C ++,所以这就是我写的.我基本上可以将大多数代码移植到C ++(尽管我确实做到了我的",因为存在一些与语言无关的差异).我唯一真正的问题是这段代码:

The original tutorial is written in Java. I prefer C++, so that's what I wrote mine with. I was able to basically port most of the code to C++ (although, I did make it "mine" in the sense that there are some non-language related differences). The only real problem I have is with this line of code:

public Expression parse(Parser parser, Token token) {
   Expression operand = parser.parseExpression();
?    return new PrefixExpression(token.getType(), operand);

这在Java中很好用(我假设.我以前从未真正使用过Java,但是我认为那家伙知道他在做什么),但在C ++中则不是那么好.我可以通过使用像这样的指针来完成同样的事情:

This works fine in Java (I'm assuming. I've never really worked with Java before, but I assume the guy knows what he's doing), but in C++ not so much. I was able to accomplish the same thing by using pointers like so:

Expression* parse(Parser& parser, Token token) {
    Expression* operand = parser.parseExpression();
    return new PrefixExpression(token.getType(), operand);

(尽管我不熟悉Java的语义)在C ++中似乎完全一样,只是使用指针而不是普通对象.

Which (although I am unfamiliar with the semantics of Java) seems to do the exact same thing in C++, only with pointers instead of normal objects.

但是,使用这样的指针的问题是它的处理速度很快.现在,一切都变得更容易使用指针了,这意味着我必须担心释放,如果操作不正确,可能会发生内存泄漏.只是一团糟.

However, the problem with working with pointers like this is that it gets messy kind of fast. Now it become much easier for everything to work with pointers, which means I have to worry about deallocation, and maybe memory leaks if I don't do it right. It just becomes a mess.

现在,解决方案似乎很简单.我可以这样返回PrefixExpression:

Now, the solution seems easy. I could just return PrefixExpression like this:

Expression parse(Parser& parser, Token token) {
    Expression operand = parser.parseExpression();
    return PrefixExpression(token.getType(), operand);

这是我的问题:如果这样做,我将丢失vtable和此新Expression中的所有其他数据.这是一个问题,因为Expression实际上只是许多类型的表达式的基类. Parse可以解析它想要的任何内容,而不仅仅是PrefixExpression.这就是原始设计的方式.通常,我喜欢这种设计,但是,正如您所看到的,它正在引起问题.仅在此处返回一个新的Expression就会丢失以后该对象需要的东西.

Here's my problem: if I do it like this, I lose the vtable and any extra data in this new Expression. That's a problem since Expression is actually just a base class for many types of expressions. Parse can parse anything it wants to, not just a PrefixExpression. That's how the original was designed. Generally, I like that design, but, as you can see, it's causing problems. Simply returning a new Expression right here loses things I need from that object later.

现在,我可以尝试通过返回引用来解决此问题:

Now, I can try to solve this by returning a reference:

Expression& parse(Parser& parser, Token token) {
    // ...
    return PrefixExpression(token.getType(), operand);

这解决了vtable和额外的数据问题,但是现在创建了一个新的问题.我正在返回对将立即销毁的变量的引用,这没有帮助.

That solves the vtable and extra data problem, but now that creates a new one. I'm returning a reference to a variable that will be destroyed instantly, which is of no help.

所有这些要说的,这就是为什么我最初最终使用指针的原因.指针让我保留以后需要的数据,但是它们确实很难使用.我可以挤一下,但就个人而言,我想要更好的东西.

All of this to say, that's why I originally ultimately went with pointers. Pointers let me keep the data I needed later, but they are really hard to work with. I can squeeze by, but personally I'd like something better.

我想我可以使用std::move,但是我对它不够熟悉,无法确定我会正确使用它.如果需要的话,但是正确地实现它需要一些我没有的技能和知识.此外,到目前为止,要重新整理我必须做的所有工作,还有很多工作.

I think I could use std::move, but I'm not familiar with that enough to be certain I'd be using it properly. If I have to I will, but implementing that properly takes some skill and knowledge I just don't have. Besides, that is a lot of work to rework everything I have to work that way up to this point.

所有这些都引出了我的问题的要点:我可以简单地安全地返回对新对象的引用吗?让我举一个例子:

All of that to lead to the main point of my question: can I simply just return a reference to a new object safely? Let me just show an example:

Expression& parse(Parser& parser, Token token) {
    //...
    return *(new PrefixExpression(token.getType(), operand));

这很好,可以解决我的大多数问题,因为如果执行了我认为的操作,我将获得对新对象的引用,保留vtable和额外的数据,并且不会立即销毁它.这样我就可以吃蛋糕了.

This would be nice and solve most of my problems because, if it does what I think it does, I get a reference to a new object, keep the vtable and extra data, and it doesn't get destroyed immediately. This would let me have my cake and eat it too.

但是,我的问题是我可以实际执行此操作吗?虽然我有充分的理由这样做,但对我来说这似乎很不可思议.我正在函数内部分配新数据,并希望像任何普通变量一样将其自动释放到函数外部.即使该 did 工作正常,它的行为是否也会完全超出此功能的范围?我担心这可能会调用未定义的行为或类似的东西.标准对此有何看法?

However, my problem is can I actually do this? While I feel I have a good reason to do this, this to me seems very weird. I'm allocating new data inside a function, and expecting it to get deallocated outside the function automatically like any normal variable. Even if that did work, would that behave as I would expect it to outside this function completely? I am scared that this might be invoking undefined behavior or something like that. What does the standard think of this?

所以这是要求的最小样本:

So here is a requested minimal sample:

表达:

    // A (not really pure) purely virtual base class that holds all types of expressions
    class Expression {
        protected:
            const std::string type;
        public:
            Expression() : type("default") {}
            virtual ~Expression() {} //Because I'm dealing with pointers, I *think* I need a virtual destructor here. Otherwise, I don't really need 

            virtual operator std::string() {
                // Since I am working with a parser, I want some way to debug and make sure I'm parsing correctly. This was the easiest.
                throw ("ERROR: No conversion to std::string implemented for this expression!");
            }
            // Keep in mind, I may do several other things here, depending on how I want to use Expression
};

一个孩子Expression,用于括号:

    class Paren : public Expression {
        private:
            // Again, Pointer is not my preferred way, but this was just easier, since Parse() was returning a pointer anyway.
            Expression* value;
        public:
            Paren(Expression *e) {
                // I know this is also sketchy. I should be trying to perform a copy here. 
                // However, I'm not sure how to do this, since Expression could be anything.
                // I just decided to write my code so the new object takes ownership of the  pointer. I could and should do better 
                value = e;
            }

            virtual operator std::string() {
                return "(" + std::string(*value) + ")";
            }

            // Because again, I'm working with pointers
            ~Paren() {delete value;}
    };

还有一个解析器:

class Parser {
    private:
        Grammar::Grammar grammar;
    public:
        // this is just a function that creates a unique identifier for each token.
        // Tokens normally have types identifier, number, or symbol.
        // This would work, except I'd like to make grammar rules based off
        // the type of symbol, not all symbols in general
        std::string GetMapKey(Tokenizer::Token token) {
                if(token.type == "symbol") return token.value;
                return token.type;
        }
        // the parsing function
        Expression * parseExpression(double precedence = 0) {
            // the current token
            Token token = consume();

                // detect and throw an error here if we have no such prefix
                if(!grammar.HasPrefix(GetMapKey(token))) {
                    throw("Error! Invalid grammar! No such prefix operator.");
                }

                // get a prefix parselet 
                Grammar::PrefixCallback preParse = grammar.GetPrefixCallback(GetMapKey(token));

                // get the left side
                Expression * left = preParse(token,*this);

                token = peek();

                double debug = peekPrecedence();

                while(precedence < peekPrecedence() && grammar.HasInfix(GetMapKey(token))) {
                    // we peeked the token, now we should consume it, now that we know there are no errors
                    token = consume();

                    // get the infix parser
                    Grammar::InfixCallback inParse = grammar.GetInfixCallback(GetMapKey(token));


                    // and get the in-parsed token
                    left = inParse(token,left,*this);
                }

                return left;
            }

在发布解析器代码之后,我意识到我应该提到我将所有与语法相关的内容放入了自己的类中.它仅具有一些与语法相关的实用程序,并且允许我们编写独立于语法的语法分析器,以后再担心语法:

After I posted the parser code, I realized I should mention that I put all the grammar related stuff into its own class. It just has some nice utilities related to grammar, as well as allows us to write a grammar independent parser and worry about the grammar later:

    class Grammar {
        public:
            // I'm in visual studio 2010, which doesn't seem to like the using type = value; syntax, so this instead
            typedef std::function<Expression*(Tokenizer::Token,Parser&)> PrefixCallback;
            typedef std::function<Expression*(Tokenizer::Token, Expression*, Parser&)> InfixCallback;
        private:
            std::map<std::string, PrefixCallback> prefix;
            std::map<std::string, InfixCallback> infix;
            std::map<std::string, double> infixPrecedence; // we'll use double precedence for more flexabillaty
        public:
            Grammar() {
                prefixBindingPower = std::numeric_limits<double>::max();
            }

            void RegisterPrefix(std::string key, PrefixCallback c) {
                prefix[key] = c;
            }

            PrefixCallback GetPrefixCallback(std::string key) {
                return prefix[key];
            }

            bool HasPrefix(std::string key) {
                return prefix.find(key) != prefix.end();
            }

            void RegisterInfix(std::string key, InfixCallback c, double p) {
                infix[key] = c;
                infixPrecedence[key] = p;
            }

            InfixCallback GetInfixCallback(std::string key) {
                return infix[key];
            }

            double GetInfixPrecedence(std::string key) {
                return infixPrecedence[key];
            }

            bool HasInfix(std::string key) {
                return infix.find(key) != infix.end();
            }
    };

最后,我可能需要显示一个解析回调来完成设置:

Finally, I probably need to show a parsing callback to complete the set:

    Expression* ParenPrefixParselet(Tokenizer::Token token, Parser& parser) {
        Expression* value = parser.parseExpression(0);
        Expression* parenthesis = new Paren(value); // control of value gets given to  our new expression. No need to delete
        parser.consume(")");

        return parenthesis;
    }

这使我能够编写一种语法,该语法允许使用如下括号:

That allows me to write a grammar that allows for things in parenthesis like this:

Grammar g;
g.RegisterPrefix("(", &ParenPrefixParselet);

最后是main():

int main() {
    Grammar g;
    g.RegisterPrefix("(", &ParenPrefixParselet);
    Parser parser(g);

    Expression* e = parser.parseExpression(0);

    std::cout << static_cast<std::string>(*e);

    return 0;
}

信不信由你,我认为这是非常微不足道的.记住,这是一个解析器.请记住,作为一个最小的示例,我计划对其进行扩展,但是希望您能理解.

Believe it or not, I think that's pretty minimal. Remember, this is a parser. Keep in mind, that as a minimal example, I plan on it being expanded, but hopefully you get the idea.

推荐答案

您希望使用多态性-有两种方法.使用引用或指针.带有引用的东西是当您返回它们时很危险.在大多数情况下,UB是您返回对本地对象的引用.这意味着我们只剩下指针了.

You wish to use polymorphism - there are two ways. Either use references or pointers. The thing with references is that it's dangerous when you return them. It's most of the time UB when you return a reference to a local object. That means we're left with pointers.

但不要使用newdelete.它们是不安全的,难以处理,尤其是在多范围环境中.使用智能指针.使用unique_ptr:

But don't use new and delete. They are unsafe, hard to deal with, especially in multi-scope enviroment. Use a smart pointer. Use a unique_ptr:

#include <memory>

struct expression {
    virtual void foo() = 0;
    virtual ~expression() = default;
};

struct prefix_expression : expression {
    virtual void foo() { /* default impl */ }

    // dummy c-tor
    prefix_expression(int) {}
};

// note that parse() returns a pointer to any *expression*!
std::unique_ptr<expression> parse() {
    // pass to make_unique whatever arguments the constructor of prefix_expression needs
    return std::make_unique<prefix_expression>(42);
}

int main() {
    {
        auto expr = parse();
        // here, *expr* goes out of score and properly deletes whatever it has new-ed
    }
}

还要回答标题中的问题-.

To also answer the question in the title - no.

这篇关于是否从函数未定义的行为返回取消引用的指针作为引用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆