C - 交换两个相同大小的内存块的最快方法?(解决方案可行性) [英] C - fastest method to swap two memory blocks of equal size? (Solution feasibility)

查看:100
本文介绍了C - 交换两个相同大小的内存块的最快方法?(解决方案可行性)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是<的扩展强>这个.在这里,我提出了两种可能的解决方案,我想知道它们的可行性.我正在使用带有 GCC/ICC 编译器的 Haswell 微体系结构.我还假设内存是对齐的.

This question is an extension of this one. Here I present two possible solutions and I want to known their feasibility. I am using a Haswell microarchitecture with GCC/ICC compilers. I also assume that memory is aligned.

选项 1 - 我已经分配了一个内存位置并进行了 3 次内存移动.(我使用 memmove 而不是 memcpy 来避免复制构造函数).

OPTION 1 - I have a memory position already allocated and do 3 memory moves. (I use memmove instead of memcpy to avoid the copy constructor).

void swap_memory(void *A, void* B, size_t TO_MOVE){

    memmove(aux, B, TO_MOVE);
    memmove(B, A, TO_MOVE);
    memmove(A, aux, TO_MOVE);
}

<小时>

选项 2 - 使用 AVX 或 AVX2 加载和存储,利用对齐的内存.对于这个解决方案,我认为我交换了 int 数据类型.


OPTION 2 - Use AVX or AVX2 loads and stores, taking advantage of the aligned memory. To this solution I consider that I swap int data types.

void swap_memory(int *A, int* B, int NUM_ELEMS){

    int i, STOP_VEC = NUM_ELEMS - NUM_ELEMS%8;
    __m256i data_A, data_B;

    for (i=0; i<STOP_VEC; i+=8) {
        data_A = _mm256_load_si256((__m256i*)&A[i]);
        data_B = _mm256_load_si256((__m256i*)&B[i]);

        _mm256_store_si256((__m256i*)&A[i], data_B);
        _mm256_store_si256((__m256i*)&B[i], data_A);
    }

    for (; i<NUM_ELEMS; i++) {
        std::swap(A[i], B[i]);
    }
}

选项 2 是最快的吗?有没有我没有提到的另一个更快的实现?

Is the option 2 the fastest? Is there another faster implementation that I din't mention?

推荐答案

如果您确定内存是对齐的,那么使用 AVX 可能是最好的.请注意,显式执行此操作可能不可移植 - 装饰指针可能会更好,以便知道它们是对齐的(例如,使用 aligned 属性或类似属性.)

If you know for sure that the memory is aligned, using AVX may be best. Note that doing it explicitly may not be portable - it might be better to decorate the pointers such that they're known to be aligned (e.g. using an aligned attribute or similar.)

最有可能的选项 2(或在语义上这样做)可能会更快,因为指针不受限制或任何东西.编译器可能不知道重新排序内存或保持aux"不变是安全的.

Most likely option 2 (or something semantically doing that) may be faster, since the pointers aren't restricted or anything. The compiler may not know that it's safe to reorder the memory or leave "aux" untouched.

此外,选项 2 可能更加线程安全,具体取决于 aux 的设置方式.

Further, option 2 may be more threadsafe depending on how aux is set up.

使用本地临时文件和 memcpy 来/从该临时文件块甚至一次全部使用可能没问题,因为 gcc 可能能够对其进行矢量化.避免使用外部临时对象,并确保所有结构都装饰为对齐.

It might be fine to use a local temporary and memcpy to/from that temporary in blocks or even all at once, as gcc might be able to vectorize that. Avoid using external temporaries, and make sure all of your structures are decorated as aligned.

这篇关于C - 交换两个相同大小的内存块的最快方法?(解决方案可行性)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆