C循环优化帮助进行最终分配(禁用编译器优化) [英] C loop optimization help for final assignment (with compiler optimization disabled)

查看:138
本文介绍了C循环优化帮助进行最终分配(禁用编译器优化)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,对于我在计算机系统课程中的最终作业,我们需要针对循环进行优化,以使其比原始循环更快.

So for my final assignment in my Computer Systems class, we need to optimize these for loops to be faster than the original.

使用我们的linux服务器,基本成绩低于7秒,完整成绩低于5秒.我在这里拥有的这段代码大约需要5.6秒.我想我可能需要以某种方式使用指针来使其更快地运行,但是我不确定.有人可以提供我的任何提示或选项吗?

The basic grade is under 7 seconds and the full grade is under 5 seconds with our linux server. This code that I have right here gets about 5.6 seconds. I am thinking I may need to use pointers with this in some way to get it to go faster but I'm not really sure. Could anyone offer any tips or options that I have?

文件必须保留50行或更少,而我忽略了教师已包括的注释行.

The file must remain 50 lines or less and I am ignoring those commented lines the instructor has included.

#include <stdio.h>
#include <stdlib.h>

// You are only allowed to make changes to this code as specified by the comments in it.

// The code you submit must have these two values.
#define N_TIMES     600000
#define ARRAY_SIZE   10000

int main(void)
{
    double  *array = calloc(ARRAY_SIZE, sizeof(double));
    double  sum = 0;
    int     i;

    // You can add variables between this comment ...
    register double sum1 = 0, sum2 = 0, sum3 = 0, sum4 = 0, sum5 = 0, sum6 = 0, sum7 = 0, sum8 = 0, sum9 = 0;
    register int j;
    // ... and this one.

    printf("CS201 - Asgmt 4 - \n");

    for (i = 0; i < N_TIMES; i++)
    {
        // You can change anything between this comment ...
        for (j = 0; j < ARRAY_SIZE; j += 10)
        {
            sum += array[j];
            sum1 += array[j + 1];
            sum2 += array[j + 2];
            sum3 += array[j + 3];
            sum4 += array[j + 4];
            sum5 += array[j + 5];
            sum6 += array[j + 6];
            sum7 += array[j + 7];
            sum8 += array[j + 8];
            sum9 += array[j + 9];
        }
        // ... and this one. But your inner loop must do the same
        // number of additions as this one does.
    }                   

    // You can add some final code between this comment ...
    sum += sum1 + sum2 + sum3 + sum4 + sum5 + sum6 + sum7 + sum8 + sum9;
    // ... and this one.

    return 0;
}

推荐答案

可能处在正确的轨道上,尽管您需要对其进行确定以确保其准确性(我对测量,而不是猜测在这里似乎是多余的,因为分配的整个 point 都是要测量的.

You may be on the right track, though you'll need to measure it to be certain (my normal advice to measure, not guess seems a little superfluous here since the whole point of the assignment is to measure).

优化的编译器可能不会有太大的区别,因为它们对这类东西非常聪明,但是,由于我们不知道它将以何种优化级别进行编译,因此您可能会获得实质性的改进.

Optimising compilers will probably not see much of a difference since they're pretty clever about that sort of stuff but, since we don't know what optimisation level it will be compiling at, you may get a substantial improvement.

要在内部循环中使用指针,只需添加一个指针变量即可.

To use pointers in the inner loop is a simple matter of first adding a pointer variable:

register double *pj;

然后将循环更改为:

for (pj = &(array[0]); pj < &(array[ARRAY_SIZE]); j++) {
        sum += *j++;
        sum1 += *j++;
        sum2 += *j++;
        sum3 += *j++;
        sum4 += *j++;
        sum5 += *j++;
        sum6 += *j++;
        sum7 += *j++;
        sum8 += *j++;
        sum9 += *j;
    }

这使循环内的加法数量保持不变(当然,假设您将+=++算作加法运算符),但是基本上使用指针而不是数组索引.

This keeps the amount of additions the same within the loop (assuming you're counting += and ++ as addition operators, of course) but basically uses pointers rather than array indexes.

在我的系统上没有优化 1 时,它从9.868秒(CPU时间)降至4.84秒.您的里程可能会有所不同.

With no optimisation1 on my system, this drops it from 9.868 seconds (CPU time) to 4.84 seconds. Your mileage may vary.

1 在优化级别为-O3的情况下,两者的报告时间均为0.001秒,因此,如上所述,优化器非常聪明.但是,鉴于您看到的时间超过5秒,我建议您不要在优化时对其进行编译.

1 With optimisation level -O3, both are reported as taking 0.001 seconds so, as mentioned, the optimisers are pretty clever. However, given you're seeing 5+ seconds, I'd suggest it wasn't been compiled with optimisation on.

顺便说一句,这是一个很好的理由,通常建议您以可读的方式编写代码,并让编译器负责使其运行速度更快.尽管我微不足道的优化尝试使速度提高了一倍,但使用-O3使其运行速度快了 10,000 倍:-)

As an aside, this is a good reason why it's usually advisable to write your code in a readable manner and let the compiler take care of getting it running faster. While my meager attempts at optimisation roughly doubled the speed, using -O3 made it run some ten thousand times faster :-)

这篇关于C循环优化帮助进行最终分配(禁用编译器优化)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆