是否从函数未定义的行为返回取消引用的指针作为引用? [英] Is returning a dereferenced pointer as a reference from a function undefined behavior?
问题描述
我是第一次写解析器.我正在关注关于Pratt解析器的本教程.我已经准备好了,但是我遇到了一个问题.
I'm writing a parser for the first time. I'm following this tutorial on Pratt parers. I've got it to work, but I've come up with sort of a problem.
原始教程是用Java编写的.我更喜欢C ++,所以这就是我写的.我基本上可以将大多数代码移植到C ++(尽管我确实做到了我的",因为存在一些与语言无关的差异).我唯一真正的问题是这段代码:
The original tutorial is written in Java. I prefer C++, so that's what I wrote mine with. I was able to basically port most of the code to C++ (although, I did make it "mine" in the sense that there are some non-language related differences). The only real problem I have is with this line of code:
public Expression parse(Parser parser, Token token) {
Expression operand = parser.parseExpression();
? return new PrefixExpression(token.getType(), operand);
这在Java中很好用(我假设.我以前从未真正使用过Java,但是我认为那家伙知道他在做什么),但在C ++中则不是那么好.我可以通过使用像这样的指针来完成同样的事情:
This works fine in Java (I'm assuming. I've never really worked with Java before, but I assume the guy knows what he's doing), but in C++ not so much. I was able to accomplish the same thing by using pointers like so:
Expression* parse(Parser& parser, Token token) {
Expression* operand = parser.parseExpression();
return new PrefixExpression(token.getType(), operand);
(尽管我不熟悉Java的语义)在C ++中似乎完全一样,只是使用指针而不是普通对象.
Which (although I am unfamiliar with the semantics of Java) seems to do the exact same thing in C++, only with pointers instead of normal objects.
但是,使用这样的指针的问题是它的处理速度很快.现在,一切都变得更容易使用指针了,这意味着我必须担心释放,如果操作不正确,可能会发生内存泄漏.只是一团糟.
However, the problem with working with pointers like this is that it gets messy kind of fast. Now it become much easier for everything to work with pointers, which means I have to worry about deallocation, and maybe memory leaks if I don't do it right. It just becomes a mess.
现在,解决方案似乎很简单.我可以这样返回PrefixExpression
:
Now, the solution seems easy. I could just return PrefixExpression
like this:
Expression parse(Parser& parser, Token token) {
Expression operand = parser.parseExpression();
return PrefixExpression(token.getType(), operand);
这是我的问题:如果这样做,我将丢失vtable和此新Expression
中的所有其他数据.这是一个问题,因为Expression
实际上只是许多类型的表达式的基类. Parse
可以解析它想要的任何内容,而不仅仅是PrefixExpression
.这就是原始设计的方式.通常,我喜欢这种设计,但是,正如您所看到的,它正在引起问题.仅在此处返回一个新的Expression
就会丢失以后该对象需要的东西.
Here's my problem: if I do it like this, I lose the vtable and any extra data in this new Expression
. That's a problem since Expression
is actually just a base class for many types of expressions. Parse
can parse anything it wants to, not just a PrefixExpression
. That's how the original was designed. Generally, I like that design, but, as you can see, it's causing problems. Simply returning a new Expression
right here loses things I need from that object later.
现在,我可以尝试通过返回引用来解决此问题:
Now, I can try to solve this by returning a reference:
Expression& parse(Parser& parser, Token token) {
// ...
return PrefixExpression(token.getType(), operand);
这解决了vtable和额外的数据问题,但是现在创建了一个新的问题.我正在返回对将立即销毁的变量的引用,这没有帮助.
That solves the vtable and extra data problem, but now that creates a new one. I'm returning a reference to a variable that will be destroyed instantly, which is of no help.
所有这些要说的,这就是为什么我最初最终使用指针的原因.指针让我保留以后需要的数据,但是它们确实很难使用.我可以挤一下,但就个人而言,我想要更好的东西.
All of this to say, that's why I originally ultimately went with pointers. Pointers let me keep the data I needed later, but they are really hard to work with. I can squeeze by, but personally I'd like something better.
我想我可以使用std::move
,但是我对它不够熟悉,无法确定我会正确使用它.如果需要的话,但是正确地实现它需要一些我没有的技能和知识.此外,到目前为止,要重新整理我必须做的所有工作,还有很多工作.
I think I could use std::move
, but I'm not familiar with that enough to be certain I'd be using it properly. If I have to I will, but implementing that properly takes some skill and knowledge I just don't have. Besides, that is a lot of work to rework everything I have to work that way up to this point.
所有这些都引出了我的问题的要点:我可以简单地安全地返回对新对象的引用吗?让我举一个例子:
All of that to lead to the main point of my question: can I simply just return a reference to a new object safely? Let me just show an example:
Expression& parse(Parser& parser, Token token) {
//...
return *(new PrefixExpression(token.getType(), operand));
这很好,可以解决我的大多数问题,因为如果执行了我认为的操作,我将获得对新对象的引用,保留vtable和额外的数据,并且不会立即销毁它.这样我就可以吃蛋糕了.
This would be nice and solve most of my problems because, if it does what I think it does, I get a reference to a new object, keep the vtable and extra data, and it doesn't get destroyed immediately. This would let me have my cake and eat it too.
但是,我的问题是我可以实际执行此操作吗?虽然我有充分的理由这样做,但对我来说这似乎很不可思议.我正在函数内部分配新数据,并希望像任何普通变量一样将其自动释放到函数外部.即使该 did 工作正常,它的行为是否也会完全超出此功能的范围?我担心这可能会调用未定义的行为或类似的东西.标准对此有何看法?
However, my problem is can I actually do this? While I feel I have a good reason to do this, this to me seems very weird. I'm allocating new data inside a function, and expecting it to get deallocated outside the function automatically like any normal variable. Even if that did work, would that behave as I would expect it to outside this function completely? I am scared that this might be invoking undefined behavior or something like that. What does the standard think of this?
所以这是要求的最小样本:
So here is a requested minimal sample:
表达:
// A (not really pure) purely virtual base class that holds all types of expressions
class Expression {
protected:
const std::string type;
public:
Expression() : type("default") {}
virtual ~Expression() {} //Because I'm dealing with pointers, I *think* I need a virtual destructor here. Otherwise, I don't really need
virtual operator std::string() {
// Since I am working with a parser, I want some way to debug and make sure I'm parsing correctly. This was the easiest.
throw ("ERROR: No conversion to std::string implemented for this expression!");
}
// Keep in mind, I may do several other things here, depending on how I want to use Expression
};
一个孩子Expression
,用于括号:
class Paren : public Expression {
private:
// Again, Pointer is not my preferred way, but this was just easier, since Parse() was returning a pointer anyway.
Expression* value;
public:
Paren(Expression *e) {
// I know this is also sketchy. I should be trying to perform a copy here.
// However, I'm not sure how to do this, since Expression could be anything.
// I just decided to write my code so the new object takes ownership of the pointer. I could and should do better
value = e;
}
virtual operator std::string() {
return "(" + std::string(*value) + ")";
}
// Because again, I'm working with pointers
~Paren() {delete value;}
};
还有一个解析器:
class Parser {
private:
Grammar::Grammar grammar;
public:
// this is just a function that creates a unique identifier for each token.
// Tokens normally have types identifier, number, or symbol.
// This would work, except I'd like to make grammar rules based off
// the type of symbol, not all symbols in general
std::string GetMapKey(Tokenizer::Token token) {
if(token.type == "symbol") return token.value;
return token.type;
}
// the parsing function
Expression * parseExpression(double precedence = 0) {
// the current token
Token token = consume();
// detect and throw an error here if we have no such prefix
if(!grammar.HasPrefix(GetMapKey(token))) {
throw("Error! Invalid grammar! No such prefix operator.");
}
// get a prefix parselet
Grammar::PrefixCallback preParse = grammar.GetPrefixCallback(GetMapKey(token));
// get the left side
Expression * left = preParse(token,*this);
token = peek();
double debug = peekPrecedence();
while(precedence < peekPrecedence() && grammar.HasInfix(GetMapKey(token))) {
// we peeked the token, now we should consume it, now that we know there are no errors
token = consume();
// get the infix parser
Grammar::InfixCallback inParse = grammar.GetInfixCallback(GetMapKey(token));
// and get the in-parsed token
left = inParse(token,left,*this);
}
return left;
}
在发布解析器代码之后,我意识到我应该提到我将所有与语法相关的内容放入了自己的类中.它仅具有一些与语法相关的实用程序,并且允许我们编写独立于语法的语法分析器,以后再担心语法:
After I posted the parser code, I realized I should mention that I put all the grammar related stuff into its own class. It just has some nice utilities related to grammar, as well as allows us to write a grammar independent parser and worry about the grammar later:
class Grammar {
public:
// I'm in visual studio 2010, which doesn't seem to like the using type = value; syntax, so this instead
typedef std::function<Expression*(Tokenizer::Token,Parser&)> PrefixCallback;
typedef std::function<Expression*(Tokenizer::Token, Expression*, Parser&)> InfixCallback;
private:
std::map<std::string, PrefixCallback> prefix;
std::map<std::string, InfixCallback> infix;
std::map<std::string, double> infixPrecedence; // we'll use double precedence for more flexabillaty
public:
Grammar() {
prefixBindingPower = std::numeric_limits<double>::max();
}
void RegisterPrefix(std::string key, PrefixCallback c) {
prefix[key] = c;
}
PrefixCallback GetPrefixCallback(std::string key) {
return prefix[key];
}
bool HasPrefix(std::string key) {
return prefix.find(key) != prefix.end();
}
void RegisterInfix(std::string key, InfixCallback c, double p) {
infix[key] = c;
infixPrecedence[key] = p;
}
InfixCallback GetInfixCallback(std::string key) {
return infix[key];
}
double GetInfixPrecedence(std::string key) {
return infixPrecedence[key];
}
bool HasInfix(std::string key) {
return infix.find(key) != infix.end();
}
};
最后,我可能需要显示一个解析回调来完成设置:
Finally, I probably need to show a parsing callback to complete the set:
Expression* ParenPrefixParselet(Tokenizer::Token token, Parser& parser) {
Expression* value = parser.parseExpression(0);
Expression* parenthesis = new Paren(value); // control of value gets given to our new expression. No need to delete
parser.consume(")");
return parenthesis;
}
这使我能够编写一种语法,该语法允许使用如下括号:
That allows me to write a grammar that allows for things in parenthesis like this:
Grammar g;
g.RegisterPrefix("(", &ParenPrefixParselet);
最后是main():
int main() {
Grammar g;
g.RegisterPrefix("(", &ParenPrefixParselet);
Parser parser(g);
Expression* e = parser.parseExpression(0);
std::cout << static_cast<std::string>(*e);
return 0;
}
信不信由你,我认为这是非常微不足道的.记住,这是一个解析器.请记住,作为一个最小的示例,我计划对其进行扩展,但是希望您能理解.
Believe it or not, I think that's pretty minimal. Remember, this is a parser. Keep in mind, that as a minimal example, I plan on it being expanded, but hopefully you get the idea.
推荐答案
您希望使用多态性-有两种方法.使用引用或指针.带有引用的东西是当您返回它们时很危险.在大多数情况下,UB是您返回对本地对象的引用.这意味着我们只剩下指针了.
You wish to use polymorphism - there are two ways. Either use references or pointers. The thing with references is that it's dangerous when you return them. It's most of the time UB when you return a reference to a local object. That means we're left with pointers.
但不要使用new
和delete
.它们是不安全的,难以处理,尤其是在多范围环境中.使用智能指针.使用unique_ptr
:
But don't use new
and delete
. They are unsafe, hard to deal with, especially in multi-scope enviroment. Use a smart pointer. Use a unique_ptr
:
#include <memory>
struct expression {
virtual void foo() = 0;
virtual ~expression() = default;
};
struct prefix_expression : expression {
virtual void foo() { /* default impl */ }
// dummy c-tor
prefix_expression(int) {}
};
// note that parse() returns a pointer to any *expression*!
std::unique_ptr<expression> parse() {
// pass to make_unique whatever arguments the constructor of prefix_expression needs
return std::make_unique<prefix_expression>(42);
}
int main() {
{
auto expr = parse();
// here, *expr* goes out of score and properly deletes whatever it has new-ed
}
}
还要回答标题中的问题-否.
To also answer the question in the title - no.
这篇关于是否从函数未定义的行为返回取消引用的指针作为引用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!