在C ++中复制strided数据 [英] Copying strided data in C++
问题描述
我有两个数组,我想把一个数组复制到另一个数组中。例如,我有
AAAAAAAA ...
BBBBBBBB ...
,并且我想将 B
的每三个元素复制到 A
获取
BAABAABA ...
从帖子是否有一个标准的,stride版本的memcpy?,看来在C中没有这样的可能性。
但是,我经历过,在某些情况下, memcpy
比基于循环的副本的更快。
我的问题是; 有没有什么方法可以有效地在C ++中执行至少作为 for 循环
的标准
非常感谢。
编辑 - 澄清问题为了使问题更清楚,让我们用 a
和 b
来表示两个数组。我有一个函数执行 for 循环
int i = 0; i a_ [i] = b_ [i];
其中 []
重载运算符(我使用表达式模板技术),以便它们可以实际意味着例如
a [3 * i] = b [i];
任何方式来高效地执行跨平面内存复制在C ++执行至少作为循环的标准?
编辑2:
由于大步复制不像存储器复制那么受欢迎,芯片制造商或语言设计都没有专门支持大步复制。
假设标准代表
循环,您可以使用循环展开。一些编译器有选项来展开循环;它不是一个标准选项。
给予标准 for
循环:
#define RESULT_SIZE 72
#define SIZE_A 48
#define SIZE_B 24
unsigned int A [SIZE_A];
unsigned int B [SIZE_B];
unsigned int result [RESULT_SIZE];
unsigned int index_a = 0;
unsigned int index_b = 0;
unsigned int index_result = 0;
for(index_result = 0; index_result< RESULT_SIZE;)
{
result [index_result ++] = B [index_b ++];
result [index_result ++] = A [index_a ++];
result [index_result ++] = A [index_a ++];
}
循环展开将重复标准的内容 for
loop:
for(index_result = 0; index_result< RESULT_SIZE;)
{
result [index_result ++] = B [index_b ++];
result [index_result ++] = A [index_a ++];
result [index_result ++] = A [index_a ++];
result [index_result ++] = B [index_b ++];
result [index_result ++] = A [index_a ++];
result [index_result ++] = A [index_a ++];
}
在已展开的 版本中,已减半。
与其他选项相比,性能提升可能是微不足道的。
以下问题会影响性能,每个可能有不同的速度改进:
- 处理数据缓存未命中
- 重新加载指令管道(取决于处理器)
- 操作系统与磁盘交换内存
- 其他任务并发运行
- 并行处理(取决于处理器/平台)
并行处理的一个示例是让一个处理器复制B项到新数组,另一个处理器将A项复制到新数组。
I have two arrays and I want to copy one array into the other with some stride. For example, I have
A A A A A A A A ...
B B B B B B B B ...
and I want to copy every three elements of B
to A
to obtain
B A A B A A B A ...
From the post "Is there a standard, strided version of memcpy?", it seems that there is no such a possibility in C.
However, I have experienced that, in some cases, memcpy
is faster than a for
loop based copy.
My question is; Is there any way to efficiently perform strided memory copy in C++ performing at least as a standard for
loop?
Thank you very much.
EDIT - CLARIFICATION OF THE PROBLEM
To make the problem clearer, let us denote the two arrays at hand by a
and b
. I have a function that performs the unique following for
loop
for (int i=0; i<NumElements, i++)
a_[i] = b_[i];
where both the []
's are overloaded operators (I'm using an expression templates technique) so that they can be actually mean, for example
a[3*i]=b[i];
Is there any way to efficiently perform strided memory copy in C++ performing at least as a standard for loop?
Edit 2: There is no function for strided copying in the C++ libraries.
Since strided copying is not as popular a memory copying, chip manufacturers nor language designs have specialized support for strided copying.
Assuming a standard for
loop, you may be able to gain some performance by using Loop Unrolling. Some compilers have options to unroll loops; it's not a "standard" option.
Given a standard for
loop:
#define RESULT_SIZE 72
#define SIZE_A 48
#define SIZE_B 24
unsigned int A[SIZE_A];
unsigned int B[SIZE_B];
unsigned int result[RESULT_SIZE];
unsigned int index_a = 0;
unsigned int index_b = 0;
unsigned int index_result = 0;
for (index_result = 0; index_result < RESULT_SIZE;)
{
result[index_result++] = B[index_b++];
result[index_result++] = A[index_a++];
result[index_result++] = A[index_a++];
}
Loop unrolling would repeat the contents of the "standard" for
loop:
for (index_result = 0; index_result < RESULT_SIZE;)
{
result[index_result++] = B[index_b++];
result[index_result++] = A[index_a++];
result[index_result++] = A[index_a++];
result[index_result++] = B[index_b++];
result[index_result++] = A[index_a++];
result[index_result++] = A[index_a++];
}
In the unrolled version, the number of loops has been cut in half.
The performance improvement may be negligible compared to other options. The following issues affect performance and each may have different speed improvements:
- Processing data cache misses
- Reloading of instruction pipeline (depends on processor)
- Operating System swapping memory with disk
- Other tasks running concurrently
- Parallel processing (depends on processor / platform)
One example of parallel processing is to have one processor copy the B items to the new array and another processor copy the A items to the new array.
这篇关于在C ++中复制strided数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!