为什么我的 SSE 代码比本地 C++ 代码慢? [英] Why my SSE code is slower than native C++ code?

查看：43 发布时间：2021/8/27 19:47:25 c++ sse simd

本文介绍了为什么我的 SSE 代码比本地 C++ 代码慢?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

首先，我是 SSE 的新手.我决定加速我的代码，但它似乎比我的本机代码运行得更慢.

First of all, I am new to SSE. I decided to accelerate my code, but it seems, that it works slower, then my native code.

这是一个计算平方和的例子.在我的 Intel i7-6700HQ 上，本机代码需要 0.43 秒，SSE 需要 0.52.那么，瓶颈在哪里?

This is an example, that calculates the sum of squares. On my Intel i7-6700HQ, it takes 0.43s for native code and 0.52 for SSE. So, where is a bottleneck?

inline float squared_sum(const float x, const float y)
{
    return x * x + y * y;
}

#define USE_SIMD

void calculations()
{
    high_resolution_clock::time_point t1, t2;

    int result_v = 0;

    t1 = high_resolution_clock::now();

    alignas(16) float data_x[4];
    alignas(16) float data_y[4];
    alignas(16) float result[4];
    __m128 v_x, v_y, v_res;
    for (int y = 0; y < 5120; y++)
    {
        data_y[0] = y;
        data_y[1] = y + 1;
        data_y[2] = y + 2;
        data_y[3] = y + 3;
        for (int x = 0; x < 5120; x++)
        {
            data_x[0] = x;
            data_x[1] = x + 1;
            data_x[2] = x + 2;
            data_x[3] = x + 3;
#ifdef USE_SIMD
            v_x = _mm_load_ps(data_x);
            v_y = _mm_load_ps(data_y);
            v_x = _mm_mul_ps(v_x, v_x);
            v_y = _mm_mul_ps(v_y, v_y);
            v_res = _mm_add_ps(v_x, v_y);
            _mm_store_ps(result, v_res);
#else
            result[0] = squared_sum(data_x[0], data_y[0]);
            result[1] = squared_sum(data_x[1], data_y[1]);
            result[2] = squared_sum(data_x[2], data_y[2]);
            result[3] = squared_sum(data_x[3], data_y[3]);
#endif

            result_v += (int)(result[0] + result[1] + result[2] + result[3]);
        }
    }

    t2 = high_resolution_clock::now();
    duration<double> time_span1 = duration_cast<duration<double>>(t2 - t1);
    std::cout << "Exec time:\t" << time_span1.count() << " s\n";
}

更新:根据评论修正代码.

我使用的是 Visual Studio 2017.为 x64 编译.

UPDATE: fixed code according to comments.

I am using Visual Studio 2017. Compiled for x64.

优化:最大优化(偏好速度)(/O2)；
内联函数扩展:任何合适的 (/Ob2);
偏爱大小或速度:偏爱快速代码 (/Ot)；
省略帧指针:是 (/Oy)

编译器生成已经优化的代码，所以现在很难进一步加速它.为了进一步加速代码，您可以做的一件事是并行化.

Compilers generate already optimized code, so nowadays it is hard to accelerate it even more. The one thing you can do, to accelerate code more, is parallelization.

感谢您的回答.它们基本相同，所以我接受 Søren V. Poulsen 的回答，因为它是第一个.

Thanks for the answers. They mainly the same, so I accept Søren V. Poulsen answer because it was the first.

为什么我的 SSE 代码比本地 C++ 代码慢? [英] Why my SSE code is slower than native C++ code?

问题描述

更新:根据评论修正代码.

UPDATE: fixed code according to comments.

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

为什么我的 SSE 代码比本地 C++ 代码慢? [英] Why my SSE code is slower than native C++ code?

问题描述

更新:根据评论修正代码.

UPDATE: fixed code according to comments.

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭