C ++使用SSE指令比较int的巨大向量 [英] C++ use SSE instructions for comparing huge vectors of ints

查看:241
本文介绍了C ++使用SSE指令比较int的巨大向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的向量< vector< int>> (18M x 128)。通常我想要这个向量的2行,并通过这个函数比较:

I have a huge vector<vector<int>> (18M x 128). Frequently I want to take 2 rows of this vector and compare them by this function:

    int getDiff(int indx1, int indx2) {
    int result = 0;
    int pplus, pminus, tmp;

    for (int k = 0; k < 128; k += 2) {
        pplus = nodeL[indx2][k] - nodeL[indx1][k];
        pminus = nodeL[indx1][k + 1] - nodeL[indx2][k + 1];

        tmp = max(pplus, pminus);
        if (tmp > result) {
            result = tmp;
        }
    }
    return result;
}

正如你所看到的,函数,循环通过两个行向量做一些减法并在结束时返回最大值。这个函数将被使用一百万次,所以我想知道是否可以通过SSE指令加速。我使用Ubuntu 12.04和gcc。

As you see, the function, loops through the two row vectors does some subtraction and at the end returns a maximum. This function will be used a million times, so I was wondering if it can be accelerated through SSE instructions. I use Ubuntu 12.04 and gcc.

当然是微优化,但如果你能提供一些帮助,因为我对SSE一无所知。提前感谢

Of course it is microoptimization but it would helpful if you could provide some help, since I know nothing about SSE. Thanks in advance

基准:

    int nofTestCases = 10000000;

    vector<int> nodeIds(nofTestCases);
    vector<int> goalNodeIds(nofTestCases);
    vector<int> results(nofTestCases);

    for (int l = 0; l < nofTestCases; l++) {
        nodeIds[l] = randomNodeID(18000000);
        goalNodeIds[l] = randomNodeID(18000000);
    }



    double time, result;

    time = timestamp();
    for (int l = 0; l < nofTestCases; l++) {
        results[l] = getDiff2(nodeIds[l], goalNodeIds[l]);
    }
    result = timestamp() - time;
    cout << result / nofTestCases << "s" << endl;

    time = timestamp();
    for (int l = 0; l < nofTestCases; l++) {
        results[l] = getDiff(nodeIds[l], goalNodeIds[l]);
    }
    result = timestamp() - time;
    cout << result / nofTestCases << "s" << endl;

其中

int randomNodeID(int n) {
    return (int) (rand() / (double) (RAND_MAX + 1.0) * n);
}

/** Returns a timestamp ('now') in seconds (incl. a fractional part). */
inline double timestamp() {
    struct timeval tp;
    gettimeofday(&tp, NULL);
    return double(tp.tv_sec) + tp.tv_usec / 1000000.;
}


推荐答案

SSE版本(SSE4.1),似乎运行速度比Core i7上的原始标量代码快20%:

FWIW I put together a pure SSE version (SSE4.1) which seems to run around 20% faster than the original scalar code on a Core i7:

#include <smmintrin.h>

int getDiff_SSE(int indx1, int indx2)
{
    int result[4] __attribute__ ((aligned(16))) = { 0 };

    const int * const p1 = &nodeL[indx1][0];
    const int * const p2 = &nodeL[indx2][0];

    const __m128i vke = _mm_set_epi32(0, -1, 0, -1);
    const __m128i vko = _mm_set_epi32(-1, 0, -1, 0);

    __m128i vresult = _mm_set1_epi32(0);

    for (int k = 0; k < 128; k += 4)
    {
        __m128i v1, v2, vmax;

        v1 = _mm_loadu_si128((__m128i *)&p1[k]);
        v2 = _mm_loadu_si128((__m128i *)&p2[k]);
        v1 = _mm_xor_si128(v1, vke);
        v2 = _mm_xor_si128(v2, vko);
        v1 = _mm_sub_epi32(v1, vke);
        v2 = _mm_sub_epi32(v2, vko);
        vmax = _mm_add_epi32(v1, v2);
        vresult = _mm_max_epi32(vresult, vmax);
    }
    _mm_store_si128((__m128i *)result, vresult);
    return max(max(max(result[0], result[1]), result[2]), result[3]);
}

这篇关于C ++使用SSE指令比较int的巨大向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆