是否更好地使用std :: memcpy()或std :: copy()的性能? [英] Is it better to use std::memcpy() or std::copy() in terms to performance?

查看:158
本文介绍了是否更好地使用std :: memcpy()或std :: copy()的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否更好使用memcpy如下所示,还是更好地使用std :: copy()的性能?为什么?

  char * bits = NULL; 
...

bits = new(std :: nothrow)char [((int *)copyMe-> bits)[0]];
if(bits == NULL)
{
cout<< ERROR内存不足。\\\
;
exit(1);
}

memcpy(bits,copyMe-> bits,((int *)copyMe-> bits)[0]);


解决方案

我要反对一般的智慧std :: copy会有轻微的,几乎无法察觉的性能损失。我只是做了一个测试,发现是不真实的:我没有注意到性能差异。但是,获胜者是std :: copy。



我写了一个C + + SHA-2实现。在我的测试中,我使用所有四个SHA-2版本(224,256,384,512)散列5个字符串,并且I循环300次。我使用Boost.timer测量时间。那个300循环计数器足以完全稳定我的结果。我跑测试每次5次,在memcpy版本和std :: copy版本之间交替。我的代码利用抓取尽可能大的块的数据(许多其他实现操作 char / char * ,而我用 T / T * (其中 T 是用户实现中具有正确溢出行为的最大类型),因此对最大类型的快速存储器访问对我的算法的性能至关重要。这些是我的结果:



完成SHA-2测试的完成时间(秒)

  std :: copy memcpy%increase 
6.11 6.29 2.86%
6.09 6.28 3.03%
6.10 6.29 3.02%
6.08 6.27 3.03%
6.08 6.27 3.03%



std :: copy over memcpy的总平均增加速度:2.99%

>

我的编译器是Fedora 16 x86_64上的gcc 4.6.3。我的优化标志是 -Ofast -march = native -funsafe-loop-optimizations



我的SHA-2实施的代码。 a>



我决定在我的MD5实现上运行测试。结果不太稳定,所以我决定做10次跑。然而,在我的第一次尝试后,我得到的结果,从一个运行到下一个不同,所以我猜,有一些类型的操作系统活动。我决定重新开始。



相同的编译器设置和标志。只有一个版本的MD5,它比SHA-2快,所以我在类似的一组5个测试字符串上做了3000个循环。



这是我最后的10结果:



完成MD5测试的完成时间(秒)

  std :: copy memcpy%difference 
5.52 5.56 + 0.72%
5.56 5.55 -0.18%
5.57 5.53 -0.72%
5.57 5.52 - 0.91%
5.56 5.57 + 0.18%
5.56 5.57 + 0.18%
5.56 5.53 -0.54%
5.53 5.57 + 0.72%
5.59 5.57 -0.36%
5.57 5.56 -0.18%

std :: copy over memcpy的总平均速度下降:0.11%



我的MD5实施代码



这些结果表明,有一些优化std :: copy在我的SHA-2测试中使用std :: copy不能在我的MD5测试中使用。在SHA-2测试中,两个数组都是在调用std :: copy / memcpy的同一个函数中创建的。在我的MD5测试中,其中一个数组作为函数参数传递给函数。



我做了一些测试,看看我能做什么std :: copy再次更快。答案很简单:打开链接时间优化。这些是我在打开LTO时打开的结果(选项-flto在gcc中):



使用-flto完成MD5测试的时间strong>

  std :: copy memcpy%difference 
5.54 5.57 + 0.54%
5.50 5.53 +0.54 %
5.54 5.58 + 0.72%
5.50 5.57 + 1.26%
5.54 5.58 + 0.72%
5.54 5.57 + 0.54%
5.54 5.56 + 0.36%
5.54 5.58 + 0.72%
5.51 5.58 + 1.25%
5.54 5.57 + 0.54%

std :: copy over memcpy的总平均增长速度:0.72%



总而言之,使用std :: copy的惩罚。


$ b

结果说明



>那么为什么可以 std :: copy 提高性能?



首先,我不希望较慢的任何实现,只要内联的优化打开。所有编译器都积极主动;它可能是最重要的优化,因为它启用了许多其他优化。 std :: copy 可以(并且我怀疑所有真实世界的实现)检测到参数是trivially可复制和内存是按顺序布局。这意味着在最坏的情况下, memcpy 是合法的, std :: copy 对于 memcpy std :: copy 的简单实现应该满足编译器的对于速度或大小。



但是, std :: copy 也保留更多的信息。当你调用 std :: copy 时,函数保持类型不变。 memcpy void * 上运行,这会丢弃几乎所有有用的信息。例如,如果我传递一个 std :: uint64_t 的数组,编译器或库实现者可以利用64位对齐 std :: copy ,但是使用 memcpy 可能更难。像这样的算法的许多实现方式通过首先在范围开始处的未对准部分,然后对准部分,然后在结束处的未对准部分来工作。如果所有都保证对齐,那么代码变得更简单和更快,并且更容易让处理器中的分支预测器正确。



过早优化?



std :: copy 我期望它永远不会比 memcpy 慢,有时更快与任何现代优化编译器。此外,任何你可以 memcpy ,你可以 std :: copy memcpy 不允许在缓冲区中的任何重叠,而 std :: copy 支持在一个方向重叠c $ c> std :: copy_backward 用于其他重叠方向)。 memcpy 仅适用于指针, std :: copy 适用于任何迭代器(std :: map,std :: vector ,std :: deque,或我自己的自定义类型)。换句话说,当你需要复制大块数据时,你应该使用 std :: copy


Is it better to use memcpy as shown below or is it better to use std::copy() in terms to performance? Why?

char *bits = NULL;
...

bits = new (std::nothrow) char[((int *) copyMe->bits)[0]];
if (bits == NULL)
{
    cout << "ERROR Not enough memory.\n";
    exit(1);
}

memcpy (bits, copyMe->bits, ((int *) copyMe->bits)[0]);

解决方案

I'm going to go against the general wisdom here that std::copy will have a slight, almost imperceptible performance loss. I just did a test and found that to be untrue: I did notice a performance difference. However, the winner was std::copy.

I wrote a C++ SHA-2 implementation. In my test, I hash 5 strings using all four SHA-2 versions (224, 256, 384, 512), and I loop 300 times. I measure times using Boost.timer. That 300 loop counter is enough to completely stabilize my results. I ran the test 5 times each, alternating between the memcpy version and the std::copy version. My code takes advantage of grabbing data in as large of chunks as possible (many other implementations operate with char / char *, whereas I operate with T / T * (where T is the largest type in the user's implementation that has correct overflow behavior), so fast memory access on the largest types I can is central to the performance of my algorithm. These are my results:

Time (in seconds) to complete run of SHA-2 tests

std::copy   memcpy  % increase
6.11        6.29    2.86%
6.09        6.28    3.03%
6.10        6.29    3.02%
6.08        6.27    3.03%
6.08        6.27    3.03%

Total average increase in speed of std::copy over memcpy: 2.99%

My compiler is gcc 4.6.3 on Fedora 16 x86_64. My optimization flags are -Ofast -march=native -funsafe-loop-optimizations.

Code for my SHA-2 implementations.

I decided to run a test on my MD5 implementation as well. The results were much less stable, so I decided to do 10 runs. However, after my first few attempts, I got results that varied wildly from one run to the next, so I'm guessing there was some sort of OS activity going on. I decided to start over.

Same compiler settings and flags. There is only one version of MD5, and it's faster than SHA-2, so I did 3000 loops on a similar set of 5 test strings.

These are my final 10 results:

Time (in seconds) to complete run of MD5 tests

std::copy   memcpy      % difference
5.52        5.56        +0.72%
5.56        5.55        -0.18%
5.57        5.53        -0.72%
5.57        5.52        -0.91%
5.56        5.57        +0.18%
5.56        5.57        +0.18%
5.56        5.53        -0.54%
5.53        5.57        +0.72%
5.59        5.57        -0.36%
5.57        5.56        -0.18%

Total average decrease in speed of std::copy over memcpy: 0.11%

Code for my MD5 implementation

These results suggest that there is some optimization that std::copy used in my SHA-2 tests that std::copy could not use in my MD5 tests. In the SHA-2 tests, both arrays were created in the same function that called std::copy / memcpy. In my MD5 tests, one of the arrays was passed in to the function as a function parameter.

I did a little bit more testing to see what I could do to make std::copy faster again. The answer turned out to be simple: turn on link time optimization. These are my results with LTO turned on (option -flto in gcc):

Time (in seconds) to complete run of MD5 tests with -flto

std::copy   memcpy      % difference
5.54        5.57        +0.54%
5.50        5.53        +0.54%
5.54        5.58        +0.72%
5.50        5.57        +1.26%
5.54        5.58        +0.72%
5.54        5.57        +0.54%
5.54        5.56        +0.36%
5.54        5.58        +0.72%
5.51        5.58        +1.25%
5.54        5.57        +0.54%

Total average increase in speed of std::copy over memcpy: 0.72%

In summary, there does not appear to be a performance penalty for using std::copy. In fact, there appears to be a performance gain.

Explanation of results

So why might std::copy give a performance boost?

First, I would not expect it to be slower for any implementation, as long as the optimization of inlining is turned on. All compilers inline aggressively; it is possibly the most important optimization because it enables so many other optimizations. std::copy can (and I suspect all real world implementations do) detect that the arguments are trivially copyable and that memory is laid out sequentially. This means that in the worst case, when memcpy is legal, std::copy should perform no worse. The trivial implementation of std::copy that defers to memcpy should meet your compiler's criteria of "always inline this when optimizing for speed or size".

However, std::copy also keeps more of its information. When you call std::copy, the function keeps the types intact. memcpy operates on void *, which discards almost all useful information. For instance, if I pass in an array of std::uint64_t, the compiler or library implementer may be able to take advantage of 64-bit alignment with std::copy, but it may be more difficult to do so with memcpy. Many implementations of algorithms like this work by first working on the unaligned portion at the start of the range, then the aligned portion, then the unaligned portion at the end. If it is all guaranteed to be aligned, then the code becomes simpler and faster, and easier for the branch predictor in your processor to get correct.

Premature optimization?

std::copy is in an interesting position. I expect it to never be slower than memcpy and sometimes faster with any modern optimizing compiler. Moreover, anything that you can memcpy, you can std::copy. memcpy does not allow any overlap in the buffers, whereas std::copy supports overlap in one direction (with std::copy_backward for the other direction of overlap). memcpy only works on pointers, std::copy works on any iterators (std::map, std::vector, std::deque, or my own custom type). In other words, you should just use std::copy when you need to copy chunks of data around.

这篇关于是否更好地使用std :: memcpy()或std :: copy()的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆