时间性能,当置换和铸造双重浮动 [英] Time performance when permuting and casting double to float

查看:107
本文介绍了时间性能,当置换和铸造双重浮动的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些大数组由MATLAB给C ++(因此我需要他们,因为他们是)需要转换和排列(行市长,列市长问题)。

I have some big arrays given by MATLAB to C++ (therefore I need to take them as they are) that needs casting and permuting (row-mayor, column mayor issues).

数组 imgaux 是double类型的大小为 size_proj [0] * size_proj [1] * size_proj [2] ,需要转换为float,更改值的一些位置。最小示例如下:

The array, imgaux is double type has size size_proj[0]*size_proj[1]*size_proj[2] and needs to be casted to float, changing some locations of values. A minimal example is as follows:

#include <time.h>
#include <stdlib.h>  
   int main(void){
      int size_proj[3];
    size_proj[0] = 512;
    size_proj[1] = 512;
    size_proj[2] = 360;
    size_t num_byte_double = size_proj[0] * size_proj[1] * size_proj[2] * sizeof(double);
    size_t num_byte_float = size_proj[0] * size_proj[1] * size_proj[2] * sizeof(float);

    double *imgaux = (double*) malloc(num_byte_double);
    float* img = (float*) malloc(num_byte_float);


    clock_t begin, end;
    double time_spent;

    begin = clock();
    for (int k = 0; k < size_proj[0]; k++)
        for (int i = 0; i <size_proj[1]; i++)
            for (int j = 0; j < size_proj[2]; j++)
                img[i + k*size_proj[1] + j*size_proj[0] * size_proj[1]] = (float)imgaux[k + i*size_proj[0] + j*size_proj[0] * size_proj[1]];
    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("Time permuting and casting the input %f", (float)time_spent);
    free(imgaux);
    free(img);
    getchar();
}

但是,这是一个巨大的性能瓶颈,阵列(512 * 512 * 300)。

However, this is a huge performance bottleneck, taking up to 6 seconds for big arrays (512*512*300).

我知道如果不是做3D索引部分,我会做

I know that if instead of doing the 3D indexing part, I'd be doing

for (int k = 0; k < size_proj[0]*size_proj[1]*size_proj[3]; k++)
      img[k]=(float)imgaux[k];

代码运行大约需要0.2秒。但是,我需要如第一个代码片段中的维度的排列。

The code would be taking about 0.2 seconds to run. However, I need the "permutation" of the dimensions as in the first code snippet.

有一种方法,我可以加速该代码,同时仍然更改地方的值

Is there a way I can speed up that code while still changing the values of place?

推荐答案

好的,让我们通过尽快预先计算一下来解开你的循环:

Ok, let's unravel your loop a little bit by precalculating things ASAP:

int max0 = size_proj[0];
int max1 = size_proj[1];
int max2 = size_proj[2];

for (int k = 0; k < max0; k++)
{
    int kOffset1 = k*max1;
    int kOffset2 = k;

    for (int i = 0; i < max1; i++)
    {
        int iOffset1 = i;
        int iOffset2 = i*max0;

        for (int j = 0; j < max2; j++)
        {
            int jOffset1 = j*max0*max1;
            int jOffset2 = j*max0*max1;


            int idx1 = iOffset1 + jOffset1 + kOffset1;
            int idx2 = iOffset2 + jOffset2 + kOffset2;
            img[idx1] = (float)imgaux[idx2];
        }
    }
}

$ c> jOffset1 / 2 似乎不是最佳的嵌套循环的最低级别。这总是使得 max0 * max1 idx1 / 2 值跳转。让我们把它移动到最高级别:

The calculation for jOffset1/2 seems to be suboptimal being on the lowest level of your nested loop. This always makes the idx1/2 value jump for max0*max1 every iteration. So let's move this to the highest level:

int max0 = size_proj[0];
int max1 = size_proj[1];
int max2 = size_proj[2];
for (int j = 0; j < max2; j++)
{
    int jOffset1 = j*max0*max1;
    int jOffset2 = j*max0*max1;

    for (int k = 0; k < max0; k++)
    {
        int kOffset1 = k*max1;
        int kOffset2 = k;

        for (int i = 0; i < max1; i++)
        {
            int iOffset1 = i;
            int iOffset2 = i*max0;

            int idx1 = iOffset1 + jOffset1 + kOffset1;
            int idx2 = iOffset2 + jOffset2 + kOffset2;
            img[idx1] = (float)imgaux[idx2];
        }
    }
}

kOffset1 / 2 iOffset1 / 2 无法优化,但我们仍然有不必要的值和声明。让我们总结一下:

That already looks better. kOffset1/2 and iOffset1/2 can't be optimized anymore, but we still have unecessary values and declarations. Let's sum these up:

for (int j = 0; j < size_proj[2]; j++)
{
    int jOffset = j*size_proj[0]*size_proj[1];
    for (int k = 0; k < size_proj[0]; k++)
    {
        int kOffset1 = k*size_proj[1];
        for (int i = 0; i < size_proj[1]; i++)
        {
            int iOffset2 = i*size_proj[0];
            img[i + jOffset + kOffset1] = (float)imgaux[iOffset2 + jOffset + k];
        }
    }
}






我用你的循环和我的(使用MSVC14的同一个系统)你更新了MVCE:


I tried your updated MVCE with your loop and with mine (same system using MSVC14):

你的:


时间置换和投射输入4.180000

Time permuting and casting the input 4.180000

Mine:


时间置换和投射输入0.704000

Time permuting and casting the input 0.704000

希望我' - )

正如@BarryTheHatchet指出的,因为它在注释部分很容易监督:对 size_proj 使用3 int 数组的数组,最好使用三个 const int 值。

As @BarryTheHatchet pointed out and as it is easily overseen in the comment section: Instead of using an array of 3 int values for size_proj you better use three const int values.

不使用数组会从代码中删除复杂性(使用描述性名称)
使用 const 将阻止您在复杂计算中意外更改值,并允许编译器进行更好的优化。

Not using an array will remove complexity from your code (using descriptive names of course) The use of const will prevent you from accidentially changing values in complex calculation and may allow the compiler for better optimization.

正如@paddy指出:您可以通过预计算步长来计算嵌套循环的不同级别的乘法。

As @paddy pointed out: You may replace the multiplications at the different levels of your nested loop with calculations by precalculating the step sizes.

但是乘法版本和步骤版本没有任何真正的变化....

I had tried this but there wasn't any real change in the multiplication version and step version....

const int jStep     = size_proj[0] * size_proj[1];
const int jStepMax  = size_proj[0] * size_proj[1] * size_proj[2];
const int kStep1 = size_proj[1];
const int kStep1Max = size_proj[0] * size_proj[1];
const int kStep2 = 1;
const int kStep2Max = size_proj[0];
const int iStep1 = 1;
const int iStep1Max = size_proj[1];
const int iStep2 = size_proj[0];
const int iStep2Max = size_proj[0] * size_proj[1];

for (int jOffset = 0; jOffset < jStepMax; jOffset += jStep)
{
    for (int kOffset1 = 0, kOffset2=0; kOffset1 < kStep1Max && kOffset2 < kStep2Max; kOffset1+=kStep1, kOffset2+=kStep2)
    {
        for (int iOffset1 = 0, iOffset2 = 0; iOffset1 < iStep1Max && iOffset2 < iStep2Max; iOffset1 += iStep1, iOffset2 += iStep2)
        {
            img[iOffset1 + jOffset + kOffset1] = (float)imgaux[iOffset2 + jOffset + kOffset2];
        }
    }
}

这篇关于时间性能,当置换和铸造双重浮动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆