霓虹灯浮点乘法比预期的要慢 [英] neon float multiplication is slower than expected

查看：23 发布时间：2021/11/17 22:13:09 c++ gcc arm simd neon

本文介绍了霓虹灯浮点乘法比预期的要慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个浮动标签.我需要将第一个选项卡中的元素与第二个选项卡中的相应元素相乘，并将结果存储在第三个选项卡中.

I have two tabs of floats. I need to multiply elements from the first tab by corresponding elements from the second tab and store the result in a third tab.

我想使用 NEON 来并行化浮点乘法:同时进行四个浮点乘法而不是一个.

I would like to use NEON to parallelize floats multiplications: four float multiplications simultaneously instead of one.

我预计会有显着的加速，但我只实现了大约 20% 的执行时间减少.这是我的代码:

I have expected significant acceleration but I achieved only about 20% execution time reduction. This is my code:

#include <stdlib.h>
#include <iostream>
#include <arm_neon.h>

const int n = 100; // table size

/* fill a tab with random floats */
void rand_tab(float *t) {
    for (int i = 0; i < n; i++)
        t[i] = (float)rand()/(float)RAND_MAX;
}

/* Multiply elements of two tabs and store results in third tab
 - STANDARD processing. */
void mul_tab_standard(float *t1, float *t2, float *tr) {
    for (int i = 0; i < n; i++)
         tr[i] = t1[i] * t2[i]; 
}

/* Multiply elements of two tabs and store results in third tab 
- NEON processing. */
void mul_tab_neon(float *t1, float *t2, float *tr) {
    for (int i = 0; i < n; i+=4)
        vst1q_f32(tr+i, vmulq_f32(vld1q_f32(t1+i), vld1q_f32(t2+i)));
}

int main() {
    float t1[n], t2[n], tr[n];

    /* fill tables with random values */
    srand(1); rand_tab(t1); rand_tab(t2);


    // I repeat table multiplication function 1000000 times for measuring purposes:
    for (int k=0; k < 1000000; k++)
        mul_tab_standard(t1, t2, tr);  // switch to next line for comparison:
    //mul_tab_neon(t1, t2, tr);  
    return 1;
}

我运行以下命令进行编译:g++ -mfpu=neon -ffast-math neon_test.cpp

I run the following command to compile: g++ -mfpu=neon -ffast-math neon_test.cpp

我的 CPU:ARMv7 处理器版本 0 (v7l)

My CPU: ARMv7 Processor rev 0 (v7l)

您对我如何实现更显着的加速有什么想法吗?

Do you have any ideas how I can achieve more significant speed-up?

霓虹灯浮点乘法比预期的要慢 [英] neon float multiplication is slower than expected

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

霓虹灯浮点乘法比预期的要慢 [英] neon float multiplication is slower than expected

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭