性能:memset [英] Performance: memset
问题描述
我有执行此操作的简单 C 代码(伪代码):
I have simple C code that does this (pseudo code):
#define N 100000000
int *DataSrc = (int *) malloc(N);
int *DataDest = (int *) malloc(N);
memset(DataSrc, 0, N);
for (int i = 0 ; i < 4 ; i++) {
StartTimer();
memcpy(DataDest, DataSrc, N);
StopTimer();
}
printf("%d\n", DataDest[RandomInteger]);
我的电脑:Intel Core i7-3930,配备 4x4GB DDR3 1600 内存,运行 RedHat 6.1 64 位.
My PC: Intel Core i7-3930, with 4x4GB DDR3 1600 memory running RedHat 6.1 64-bit.
第一个 memcpy()
发生在 1.9 GB/秒,而接下来的三个发生在 6.2 GB/秒.缓冲区大小 (N
) 太大了,这不会是由缓存效应引起的.所以,我的第一个问题:
The first memcpy()
occurs at 1.9 GB/sec, while the next three occur at 6.2 GB/s.
The buffer size (N
) is too big for this to be caused by cache effects. So, my first Question:
- 为什么第一个
memcpy()
这么慢?也许malloc()
在您使用它之前不会完全分配内存?
- Why is the first
memcpy()
so much slower? Maybemalloc()
doesn't fully allocate the memory until you use it?
如果我消除了 memset()
,那么第一个 memcpy()
的运行速度约为 1.5 GB/秒,但接下来的三个运行速度为 11.8 GB/秒.几乎是 2 倍的加速.我的第二个问题:
If I eliminate the memset()
, then the first memcpy()
runs at about 1.5 GB/sec,
but the next three run at 11.8 GB/sec. Almost 2x speedup. My second question:
- 如果我不调用
memset()
,为什么memcpy()
会快 2 倍?
- Why is
memcpy()
2x faster if I don't callmemset()
?
推荐答案
正如其他人已经指出的,Linux 使用 乐观内存分配策略.
As others already pointed out, Linux uses an optimistic memory allocation strategy.
第一个和后面的memcpy
的区别在于DataDest
的初始化.
The difference between the first and the following memcpy
s is the initialization of DataDest
.
正如你已经看到的,当你消除 memset(DataSrc, 0, N)
时,第一个 memcpy
甚至更慢,因为源的页面必须是也分配.当您同时初始化 DataSrc
和 DataDest
时,例如
As you have already seen, when you eliminate memset(DataSrc, 0, N)
, the first memcpy
is even slower, because the pages for the source must be allocated as well. When you initialize both, DataSrc
and DataDest
, e.g.
memset(DataSrc, 0, N);
memset(DataDest, 0, N);
所有 memcpy
的运行速度大致相同.
all memcpy
s will run with roughly the same speed.
对于第二个问题:当您使用memset
初始化分配的内存时,所有页面将被连续布置.另一方面,当在复制时分配内存时,源页面和目标页面将被交错分配,这可能会有所不同.
For the second question: when you initialize the allocated memory with memset
all pages will be laid out consecutively. On the other side, when the memory is allocated as you copy, the source and destination pages will be allocated interleaved, which might make the difference.
这篇关于性能:memset的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!