对于C ++ Vector3实用程序类实现,数组比struct和class快吗? [英] For C++ Vector3 utility class implementations, is array faster than struct and class?

查看:88
本文介绍了对于C ++ Vector3实用程序类实现,数组比struct和class快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于好奇,我以3种方式实现了vector3实用程序:数组(带有typedef),类和结构

just out of curiosity, I implemented vector3 utilities in 3 ways: array (with a typedef), class and struct

这是数组实现:

typedef float newVector3[3];

namespace vec3{
    void add(const newVector3& first, const newVector3& second, newVector3& out_newVector3);
    void subtract(const newVector3& first, const newVector3& second, newVector3& out_newVector3);
    void dot(const newVector3& first, const newVector3& second, float& out_result);
    void cross(const newVector3& first, const newVector3& second, newVector3& out_newVector3);
    }

    // implementations, nothing fancy...really

     void add(const newVector3& first, const newVector3& second, newVector3& out_newVector3)

    {
        out_newVector3[0] = first[0] + second[0];
        out_newVector3[1] = first[1] + second[1];
        out_newVector3[2] = first[2] + second[2];
    }

    void subtract(const newVector3& first, const newVector3& second, newVector3& out_newVector3){
        out_newVector3[0] = first[0] - second[0];
        out_newVector3[1] = first[1] - second[1];
        out_newVector3[2] = first[2] - second[2];
    }

    void dot(const newVector3& first, const newVector3& second, float& out_result){
        out_result = first[0]*second[0] + first[1]*second[1] + first[2]*second[2];
    }

    void cross(const newVector3& first, const newVector3& second, newVector3& out_newVector3){
        out_newVector3[0] = first[0] * second[0];
        out_newVector3[1] = first[1] * second[1];
        out_newVector3[2] = first[2] * second[2];
    }
}

还有一个类实现:

class Vector3{
private:
    float x;
    float y;
    float z;

public:
    // constructors
    Vector3(float new_x, float new_y, float new_z){
        x = new_x;
        y = new_y;
        z = new_z;
    }

    Vector3(const Vector3& other){
        if(&other != this){
            this->x = other.x;
            this->y = other.y;
            this->z = other.z;
        }
    }
}

当然,它包含通常在Vector3类中出现的其他功能.

Of course, It contains other functionalities that usually appears in a Vector3 class.

最后是一个struct实现:

And finally, a struct implementation:

struct s_vector3{
    float x;
    float y;
    float z;

    // constructors
    s_vector3(float new_x, float new_y, float new_z){
        x = new_x;
        y = new_y;
        z = new_z;
    }

    s_vector3(const s_vector3& other){
        if(&other != this){
            this->x = other.x;
            this->y = other.y;
            this->z = other.z;
        }
    }

同样,我省略了其他一些常见的Vector3功能. 现在,我让所有三个对象创建900万个新对象,并执行900万次交叉乘积(我写了一大堆数据数据以在其中一个完成后进行缓存,以避免缓存帮助它们.)

Again, I omitted some other common Vector3 functionalities. Now, I let all three of them create 9000000 new objects, and do 9000000 times of cross product(I wrote a huge chunk of data data to cache after one of them finishes, to avoid cache help them out).

这是测试代码:

const int K_OPERATION_TIME = 9000000;
const size_t bigger_than_cachesize = 20 * 1024 * 1024;

void cleanCache()
{
    // flush the cache
    long *p = new long[bigger_than_cachesize];// 20 MB
    for(int i = 0; i < bigger_than_cachesize; i++)
    {
       p[i] = rand();
    }
}

int main(){

    cleanCache();
    // first, the Vector3 struct
    std::clock_t start;
    double duration;

    start = std::clock();

    for(int i = 0; i < K_OPERATION_TIME; ++i){
        s_vector3 newVector3Struct = s_vector3(i,i,i);
        newVector3Struct = s_vector3::cross(newVector3Struct, newVector3Struct);
    }

    duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
    printf("The struct implementation of Vector3 takes %f seconds.\n", duration);

    cleanCache();
    // second, the Vector3 array implementation
    start = std::clock();

    for(int i = 0; i < K_OPERATION_TIME; ++i){
        newVector3 newVector3Array = {i, i, i};
        newVector3 opResult;
        vec3::cross(newVector3Array, newVector3Array, opResult);
    }

    duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
    printf("The array implementation of Vector3 takes %f seconds.\n", duration);

    cleanCache();
    // Third, the Vector3 class implementation
    start = std::clock();

    for(int i = 0; i < K_OPERATION_TIME; ++i){
        Vector3 newVector3Class = Vector3(i,i,i);
        newVector3Class = Vector3::cross(newVector3Class, newVector3Class);
    }

    duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
    printf("The class implementation of Vector3 takes %f seconds.\n", duration);


    return 0;
}

结果令人惊讶.

structclass实现在0.23秒左右完成任务, 而array实现只需0.08秒!

struct and class implementations finishes the task around 0.23 seconds, whereas array implementation only takes 0.08 seconds!

如果数组确实具有这样的显着性能优势,尽管其语法很丑陋,但在很多情况下还是值得使用的.

If array does have significant performance advantage like this, though its syntax would be ugly, it's worth using in a lot of cases.

所以我真的想确定,这应该发生吗?谢谢!

So I really want to make sure, is this what supposed to happen? Thanks!

推荐答案

简短的回答:这要视情况而定.如您所见,如果不进行优化而进行编译,则存在差异.

Short answer: it depends. As you can observe, there is difference if compiled without the optimization.

当我在(-O2-O3)上进行优化编译(内联的所有函数)时,没有区别(请继续阅读,似乎并不那么容易).

When I compile (all functions inlined) with optimization on (-O2 or -O3) there is no difference (read on to see, that it is not as easy it seems).

 Optimization    Times (struct vs. array)
    -O0              0.27 vs. 0.12
    -O1              0.14 vs. 0.04
    -O2              0.00 vs. 0.00
    -O3              0.00 vs. 0.00

不能保证编译器可以/将要进行什么优化,因此完整的答案是取决于您的编译器".首先,我会相信我的编译器可以做正确的事,否则我应该开始编写程序集.只有这部分代码是真正的瓶颈,才值得考虑帮助编译器.

There is no guarantee, what optimization your compiler can/will do, so the complete answer is "it depends on your compiler". At first I would trust my compiler to do The Right Thing, otherwise I should start programming assembly. Only if this part of the code is a real bottle neck, it is worth to think about helping the compiler.

如果使用-O2进行编译,则两个版本的代码都需要花费0.0秒,但这是因为优化程序发现这些值根本没有使用,因此它只会丢弃整个代码!

If compiled with -O2, your code takes exactly 0.0 seconds for both versions, but this is because the optimizers sees, that those values are not used at all, so it just throws away the whole code!

确保不会发生这种情况:

Let's ensure, that this doesn't happen:

#include <ctime>
#include <cstdio>

const int K_OPERATION_TIME = 1000000000;

int main(){
    std::clock_t start;
    double duration;

    start = std::clock();

    double checksum=0.0;
    for(int i = 0; i < K_OPERATION_TIME; ++i){
        s_vector3 newVector3Struct = s_vector3(i,i,i);
        newVector3Struct = s_vector3::cross(newVector3Struct, newVector3Struct);
        checksum+=newVector3Struct.x +newVector3Struct.y+newVector3Struct.z; // actually using the result of cross-product!
    }

    duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
    printf("The struct implementation of Vector3 takes %f seconds.\n", duration);

    // second, the Vector3 array implementation
    start = std::clock();

    for(int i = 0; i < K_OPERATION_TIME; ++i){
        newVector3 newVector3Array = {i, i, i};
        newVector3 opResult;
        vec3::cross(newVector3Array, newVector3Array, opResult);
        checksum+=opResult[0] +opResult[1]+opResult[2];  // actually using the result of cross-product!
    }

    duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
    printf("The array implementation of Vector3 takes %f seconds.\n", duration);

    printf("Checksum: %f\n", checksum);
}

您将看到以下更改:

  1. 不涉及缓存(没有缓存丢失),所以我只删除了负责刷新它的代码.
  2. 从性能上讲,类和结构之间没有区别(编译后实际上没有区别,整个公私语法糖的区别只是肤浅的),所以我只看一下结构.
  3. 叉积的结果已被实际使用,无法进行优化.
  4. 现在有1e9个迭代,以获取有意义的时间.
  1. The cache is not involved (there are no cache-misses), so I just deleted the code responsible for flushing it.
  2. There is no difference between class and struct from the performance (after the compilation there is really no difference, the whole public-private syntactic sugar difference is only skin-deep), so I look only at the struct.
  3. The result of the cross product is actually used and cannot be optimized away.
  4. There are now 1e9 iterations, to get meaningful times.

通过此更改,我们看到了以下计时(英特尔编译器):

With this change we see the following timings (intel compiler):

 Optimization    Times (struct vs. array)
    -O0              33.2 vs. 17.1
    -O1              19.1 vs. 7.8
    -Os              19.2 vs. 7.9
    -O2              0.7 vs. 0.7
    -O3              0.7 vs. 0.7

对于-Os的性能如此之差,我感到有些失望,但是否则,您可以看到,如果进行了优化,则结构和数组之间没有任何区别!

I'm a little bit disappointed, that -Os has such a bad performance, but otherwise you can see that if optimized, there is no difference between structs and arrays!

我个人非常喜欢-Os,因为它产生了我能理解的程序集,所以让我们看一下它为什么这么慢.

Personally I like -Os a lot, because it produces assembly I'm able to understand, so let's take a look, why it is so slow.

最明显的是,无需查看结果程序集:s_vector3::cross返回一个s_vector3 -object,但是我们将结果分配给一个已经存在的对象,因此,如果优化器看不到,则旧对象为no如果不再使用,则可能无法执行RVO.所以让替换

The most obvious thing, without looking into the resulting assembly: s_vector3::cross returns a s_vector3-object but we assign the result to an already existing object, so if the optimizer does not see, that the old object is no longer used, he might not be able to do RVO. So let replace

newVector3Struct = s_vector3::cross(newVector3Struct, newVector3Struct);
checksum+=newVector3Struct.x +newVector3Struct.y+newVector3Struct.z;

具有:

s_vector3 r = s_vector3::cross(newVector3Struct, newVector3Struct);
checksum+=r.x +r.y+r.z; 

现在有结果:2.14 (struct) vs. 7.9-这是一个很大的进步!

There results now: 2.14 (struct) vs. 7.9 - that is quite an improvement!

我的收获:优化器做得很好,但是如果需要,我们可以提供一些帮助.

My take-away from it: The optimizer makes a great job, but we can help it a little if needed.

这篇关于对于C ++ Vector3实用程序类实现,数组比struct和class快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆