在C ++中复制strided数据 [英] Copying strided data in C++

查看:234
本文介绍了在C ++中复制strided数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数组,我想把一个数组复制到另一个数组中。例如,我有

  AAAAAAAA ... 

BBBBBBBB ...

,并且我想将 B 的每三个元素复制到 A 获取

  BAABAABA ... 

从帖子是否有一个标准的,stride版本的memcpy?,看来在C中没有这样的可能性。



但是,我经历过,在某些情况下, memcpy 比基于循环的副本的更快。



我的问题是; 有没有什么方法可以有效地在C ++中执行至少作为 for 循环的标准

的跨步内存复制?



非常感谢。



编辑 - 澄清问题为了使问题更清楚,让我们用 a b 来表示两个数组。我有一个函数执行 for 循环

  int i = 0; i  a_ [i] = b_ [i]; 

其中 [] 重载运算符(我使用表达式模板技术),以便它们可以实际意味着例如

  a [3 * i] = b [i]; 


解决方案

任何方式来高效地执行跨平面内存复制在C ++执行至少作为循环的标准? 

编辑2:



由于大步复制不像存储器复制那么受欢迎,芯片制造商或语言设计都没有专门支持大步复制。



假设标准代表循环,您可以使用循环展开。一些编译器有选项来展开循环;它不是一个标准选项。



给予标准 for 循环:

  #define RESULT_SIZE 72 
#define SIZE_A 48
#define SIZE_B 24

unsigned int A [SIZE_A];
unsigned int B [SIZE_B];
unsigned int result [RESULT_SIZE];

unsigned int index_a = 0;
unsigned int index_b = 0;
unsigned int index_result = 0;
for(index_result = 0; index_result< RESULT_SIZE;)
{
result [index_result ++] = B [index_b ++];
result [index_result ++] = A [index_a ++];
result [index_result ++] = A [index_a ++];
}

循环展开将重复标准的内容 for loop:

  for(index_result = 0; index_result< RESULT_SIZE;)
{
result [index_result ++] = B [index_b ++];
result [index_result ++] = A [index_a ++];
result [index_result ++] = A [index_a ++];

result [index_result ++] = B [index_b ++];
result [index_result ++] = A [index_a ++];
result [index_result ++] = A [index_a ++];
}

在已展开的 版本中,已减半。



与其他选项相比,性能提升可能是微不足道的。
以下问题会影响性能,每个可能有不同的速度改进:




  • 处理数据缓存未命中

  • 重新加载指令管道(取决于处理器)

  • 操作系统与磁盘交换内存

  • 其他任务并发运行

  • 并行处理(取决于处理器/平台)



并行处理的一个示例是让一个处理器复制B项到新数组,另一个处理器将A项复制到新数组。


I have two arrays and I want to copy one array into the other with some stride. For example, I have

A A A A A A A A ...

B B B B B B B B ...

and I want to copy every three elements of B to A to obtain

B A A B A A B A ...

From the post "Is there a standard, strided version of memcpy?", it seems that there is no such a possibility in C.

However, I have experienced that, in some cases, memcpy is faster than a for loop based copy.

My question is; Is there any way to efficiently perform strided memory copy in C++ performing at least as a standard for loop?

Thank you very much.

EDIT - CLARIFICATION OF THE PROBLEM

To make the problem clearer, let us denote the two arrays at hand by a and b. I have a function that performs the unique following for loop

for (int i=0; i<NumElements, i++)
    a_[i] = b_[i];

where both the []'s are overloaded operators (I'm using an expression templates technique) so that they can be actually mean, for example

 a[3*i]=b[i];

解决方案

Is there any way to efficiently perform strided memory copy in C++ performing at least as a standard for loop?

Edit 2: There is no function for strided copying in the C++ libraries.

Since strided copying is not as popular a memory copying, chip manufacturers nor language designs have specialized support for strided copying.

Assuming a standard for loop, you may be able to gain some performance by using Loop Unrolling. Some compilers have options to unroll loops; it's not a "standard" option.

Given a standard for loop:

#define RESULT_SIZE 72
#define SIZE_A 48
#define SIZE_B 24

unsigned int A[SIZE_A];
unsigned int B[SIZE_B];
unsigned int result[RESULT_SIZE];

unsigned int index_a = 0;
unsigned int index_b = 0;
unsigned int index_result = 0;
for (index_result = 0; index_result < RESULT_SIZE;)
{
   result[index_result++] = B[index_b++];
   result[index_result++] = A[index_a++];
   result[index_result++] = A[index_a++]; 
}

Loop unrolling would repeat the contents of the "standard" for loop:

for (index_result = 0; index_result < RESULT_SIZE;)
{
   result[index_result++] = B[index_b++];
   result[index_result++] = A[index_a++];
   result[index_result++] = A[index_a++]; 

   result[index_result++] = B[index_b++];
   result[index_result++] = A[index_a++];
   result[index_result++] = A[index_a++]; 
}

In the unrolled version, the number of loops has been cut in half.

The performance improvement may be negligible compared to other options. The following issues affect performance and each may have different speed improvements:

  • Processing data cache misses
  • Reloading of instruction pipeline (depends on processor)
  • Operating System swapping memory with disk
  • Other tasks running concurrently
  • Parallel processing (depends on processor / platform)

One example of parallel processing is to have one processor copy the B items to the new array and another processor copy the A items to the new array.

这篇关于在C ++中复制strided数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆