我应该使用指针还是移动语义来传递大块数据? [英] Should I use pointers or move semantics for passing big chunks of data?

查看:89
本文介绍了我应该使用指针还是移动语义来传递大块数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对建议的编码技术有疑问。我有一个工具进行模型分析,我有时需要传递大量的数据(从工厂类到拥有多个异构块的工厂类)。



我的问题是,是否有一些共识,如果我宁愿使用指针或移动所有权(我需要避免复制,尽可能的数据块的大小



指针版本如下所示:

  class FactoryClass {
...
public:
static Data * createData(){
Data * data = new Data;
...
返回数据;
}
};

class StorageClass {
unique_ptr< Data> data_ptr;
...
public:
void setData(Data * _data_ptr){
data_ptr.reset(_data_ptr);
}
};

void pass(){
Data * data = FactoryClass :: createData();
...
StorageClass存储;
storage.setData(data);
}

而移动版本如下:

  class FactoryClass {
...
public:
static Data createData(){
数据数据;
...
返回数据;
}
};

class StorageClass {
数据数据;
...
public:
void setData(Data _data){
data = move(_data);
}
};

void pass(){
数据data = FactoryClass :: createData();
...
StorageClass存储;
storage.setData(move(data));
}



我更喜欢移动版本 - 是的,我需要添加移动命令主代码,但是我最终只有在存储中的对象,我不必再关心指针语义了。



但是,当我使用移动语义时,我不太放松,我不详细地理解。 (我不在乎C ++ 11的要求,因为代码已经只有Gcc4.7 +可编译)。



有人有一个参考,将支持任一版本?还是有一些其他的,首选版本如何传递数据?



由于关键字通常导致其他主题,我无法使用Google。



p>

编辑注意:
第二个例子重构了注释的建议,语义保持不变。

解决方案

当你传递一个对象到一个函数,你传递的部分取决于该函数如何使用它。函数可以使用以下三种常用方法之一:


  1. 它可以简单地引用对象的持续时间调用,使用调用函数(或它的最终父调用堆栈)保持对象的所有权。在这种情况下,引用可以是常量引用或可修改引用。该函数不会长期存储此对象。


  2. 它可以直接复制对象。它不获得原始的所有权,但它确实获得了原件的副本,以便存储,修改或与副本一起做什么。请注意,#1和这之间的区别是,复制在参数列表中显式。例如,通过值取一个 std :: string 。但这也可以像通过值取 int 一样简单。


  3. 的对象的所有权。然后函数对对象的破坏有一定的责任。这也允许函数长期存储对象。


我对这些范例的参数类型的一般建议如下:


  1. 在可能的情况下通过显式语言引用获取对象。如果不可能,尝试 std :: reference_wrapper 。如果这不能工作,没有其他解决方案似乎合理,然后使用指针。一个指针应该是可选的参数(虽然C ++ 14的std :: optional会使它不那么有用。指针仍然有用处),语言数组(尽管再次,我们有对象,涵盖了这些),等等。


  2. 按值取对象。


  3. 通过value-move(即:将其移动为by-value参数)或通过智能指针到对象(它也将通过值,因为你要复制/移动它)。您的代码的问题是,您通过指针,但使用原始指针转移所有权。原始指针没有所有权语义。分配任何指针的时刻,您应该立即将其包装在某种智能指针中。因此,您的工厂函数应该已返回 unique_ptr


您的案例似乎是#3。你在价值移动和智能指针之间使用完全取决于你。如果由于某种原因你必须堆分配 Data ,那么选择几乎是为你做的。如果 Data 可以堆栈分配,那么你有一些选项。



我通常会根据估计数据的内部大小。如果在内部,它只是几个指针/整数(和少数,我的意思是3-4),然后把它放在堆栈是罚款。



,它可以更好,因为你将有更少的机会出现双缓存未命中。如果你的 Data 函数通常只是从另一个指针访问数据,如果你通过指针存储 Data 它将必须取消引用您存储的指针以获取内部指针,然后解引用内部指针。这是两个潜在的缓存未命中,因为两个指针都没有任何地方与 StorageClass



如果你存储 Data 的值,很可能是 Data 的内部指针已经在缓存中。它有更好的局部性与 StorageClass 的其他成员;如果你现在已经访问了一些 StorageClass ,你已经支付了缓存未命中,所以你可能已经有 Data 在缓存中。



但是运动不是免费的。它比一个完整的副本便宜,但它不是免费。您仍然在复制内部数据(并且可能清除原始指针上的任何指针)。但是,再次,分配内存在堆上也不是免费的。也不是解除分配。



但是,如果你不经常移动它(你把它移动到它的最终位置,之后),甚至移动更大的对象会很好。如果您使用比移动更多,那么对象存储的缓存区域可能会胜过移动的成本。



最终没有太多的技术选择理由来选择一个或另一个。我会说默认运动是合理的。


I have a questions about recommended coding technique. I have a tool for model analysis and I sometimes need to pass a big amount of data (From a factory class to one that holds multiple heterogeneous chunks).

My question is whether there is some consensus about if I should rather use pointers or move the ownership (I need to avoid copying when possible as the size of a data-block may be as big as 1 GB).

The pointer version would look like this:

class FactoryClass {
...
public:
   static Data * createData() {
      Data * data = new Data;
      ...
      return data;
   }
};

class StorageClass {
   unique_ptr<Data> data_ptr;
...
public:
   void setData(Data * _data_ptr) {
      data_ptr.reset(_data_ptr);
   }
};

void pass() {
   Data * data = FactoryClass::createData();
   ...
   StorageClass storage;
   storage.setData(data);
}

Whereas the move version is like this:

class FactoryClass {
...
public:
   static Data createData() {
      Data data;
      ...
      return data;
   }
};

class StorageClass {
   Data data;
...
public:
   void setData(Data _data) {
      data = move(_data);
   }
};

void pass() {
   Data data = FactoryClass::createData();
   ...
   StorageClass storage;
   storage.setData(move(data));
}

I like the move version better - yes, I need to add move commands to the main code, but then I in the end have just the objects in the storage and I do not have to care about pointer semantics anymore.

However I am not quite relaxed when using the move semantics whom I do not understand in detail. (I do not care about the C++11 requirement though, as the code is already only Gcc4.7+ compilable).

Would someone have a reference that would support either version? Or is there some other, preferred version of how to pass data?

I was not able to Google anything as the keywords usually led to other topics.

Thanks.

EDIT NOTE: The second example got refactored to incorporate suggestions from the comments, the semantics remained unchanged.

解决方案

When you are passing an object to a function, what you pass depends in part on how that function is going to use it. A function can use an object in one of three general ways:

  1. It can simply reference the object for the duration of the function call, with the calling function (or it's eventual parent up the call stack) maintaining ownership of the object. The reference in this case may be a constant reference or a modifiable reference. The function will not store this object long-term.

  2. It can copy the object directly. It doesn't gain ownership of the original, but it does acquire a copy of the original, so as to store, modify, or do with the copy what it will. Note that the difference between #1 and this is that the copy is made explicit in the parameter list. For example, taking a std::string by value. But this could also be as simple as taking an int by value.

  3. It can gain some form of ownership of the object. The function then has some responsibility over the object's destruction. This also allows the function to store the object long-term.

My general recommendation for the parameter types for these paradigms are as follows:

  1. Take the object by an explicit language reference where possible. If that's not possible, try a std::reference_wrapper. If that can't work, and no other solutions seem reasonable, then use a pointer. A pointer would be for things like optional parameters (though C++14's std::optional will make that less useful. Pointers will still have uses though), language arrays (though again, we have objects that cover most of the uses of these), and so forth.

  2. Take the object by value. That one's pretty non-negotiable.

  3. Take the object either by value-move (ie: move it into a by-value parameter) or by a smart-pointer to the object (which will also be taken by value, since you're going to copy/move it anyway). The problem with your code is that you're transferring ownership via a pointer, but with a raw pointer. Raw pointers have no ownership semantics. The moment you allocate any pointer, you should immediately wrap it in some kind of smart pointer. So your factory function should have returned a unique_ptr.

Your case appears to be #3. Which you use between value-move and smart pointer is entirely up to you. If you have to heap allocate Data for some reason, then the choice is pretty much made for you. If Data can be stack allocated, then you have some options.

I would generally do this based on an estimation of Data's internal size. If internally, it's just a few pointers/integers (and by "few", I mean like 3-4), then putting it on the stack is fine.

Indeed, it can better because you'll have less chance of a double-cache-miss. If your Data functions often just access data from another pointer, if you store Data by pointer, then every function call on it will have to dereference your stored pointer to fetch the internal one, then dereference the internal one. That's two potential cache misses, since neither pointer has any locality with StorageClass.

If you store Data by value, it's much more likely that Data's internal pointer will already be in the cache. It has better locality with StorageClass's other members; if you accessed some of StorageClass before now, you already paid for a cache miss, so you are likely to already have Data in the cache.

But movement is not free. It's cheaper than a full copy, but it's not free. You're still copying the internal data (and possibly nulling out any pointers on the original). But then again, allocating memory on the heap isn't free either. Nor is deallocating it.

But then again, if you're not moving it around very often (you move it around to get it to its final location, but little more after that), even moving a larger object would be fine. If you're using it more than you're moving it, then the cache locality of the object's storage will probably win out over the cost of moving.

There ultimately aren't a lot of technical reasons to pick one or the other. I would say to default to movement where reasonable.

这篇关于我应该使用指针还是移动语义来传递大块数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆