为什么一个简单的GET语句这么慢? [英] Why is a simple get-statement so slow?

查看:194
本文介绍了为什么一个简单的GET语句这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

几年前,我在学校的作业,在那里我有并行光线跟踪。结果
这是一个容易的任务,我真的很喜欢这个工作。

A few years back, I got an assignment at school, where I had to parallelize a Raytracer.
It was an easy assignment, and I really enjoyed working on it.

今天,我觉得分析的光线跟踪,看我是否能得到它跑得更快(不彻底检修代码)。在分析时,我注意到一些有趣的事情:

Today, I felt like profiling the raytracer, to see if I could get it to run any faster (without completely overhauling the code). During the profiling, I noticed something interesting:

    // Sphere.Intersect
    public bool Intersect(Ray ray, Intersection hit)
    {
        double a = ray.Dir.x * ray.Dir.x +
                   ray.Dir.y * ray.Dir.y +
                   ray.Dir.z * ray.Dir.z;
        double b = 2 * (ray.Dir.x * (ray.Pos.x - Center.x) +
                        ray.Dir.y * (ray.Pos.y - Center.y) +
                        ray.Dir.z * (ray.Pos.z - Center.z));
        double c = (ray.Pos.x - Center.x) * (ray.Pos.x - Center.x) +
                   (ray.Pos.y - Center.y) * (ray.Pos.y - Center.y) +
                   (ray.Pos.z - Center.z) * (ray.Pos.z - Center.z) - Radius * Radius;

        // more stuff here
    }



据该探查,对CPU时间的25%用于 get_Dir get_Pos ,这是为什么,我决定优化以下列方式代码:

According to the profiler, 25% of the CPU time was spent on get_Dir and get_Pos, which is why, I decided to optimize the code in the following way:

    // Sphere.Intersect
    public bool Intersect(Ray ray, Intersection hit)
    {
        Vector3d dir = ray.Dir, pos = ray.Pos;
        double xDir = dir.x, yDir = dir.y, zDir = dir.z,
               xPos = pos.x, yPos = pos.y, zPos = pos.z,
               xCen = Center.x, yCen = Center.y, zCen = Center.z;

        double a = xDir * xDir +
                   yDir * yDir +
                   zDir * zDir;
        double b = 2 * (xDir * (xPos - xCen) +
                        yDir * (yPos - yCen) +
                        zDir * (zPos - zCen));
        double c = (xPos - xCen) * (xPos - xCen) +
                   (yPos - yCen) * (yPos - yCen) +
                   (zPos - zCen) * (zPos - zCen) - Radius * Radius;

        // more stuff here
    }



以惊人的结果

With astonishing results.

在原代码,运行其默认参数的光线追踪(创建一个只直接雷击,无AA一1024×1024的图像)将采取的〜88秒。。结果
在修改后的代码,同样会采取比<强60秒。。结果
我实现的速度提升1.5〜只有少一点这个小修改的源代码。

In the original code, running the raytracer with its default arguments (create a 1024x1024 image with only direct lightning and without AA) would take ~88 seconds.
In the modified code, the same would take a little less than 60 seconds.
I achieved a speedup of ~1.5 with only this little modification to the code.

起初,我以为getter方法 Ray.Dir Ray.Pos 正在做幕后的一些东西,这将拖慢了程序

At first, I thought the getter for Ray.Dir and Ray.Pos were doing some stuff behind the scene, that would slow the program down.

下面是两个干将:

    public Vector3d Pos
    {
        get { return _pos; }
    }

    public Vector3d Dir
    {
        get { return _dir; }
    }



所以,无论返回的Vector3D,仅此而已。

So, both return a Vector3D, and that's it.

我真的不知道,如何调用吸气将采取更长的时间,比直接访问变量。

I really wonder, how calling the getter would take that much longer, than accessing the variable directly.

是不是因为CPU的缓存变量?或者,也许从调用这些方法的开销重复添加吗?或者,也许JIT处理后者比前者更好吗?或者,也许还有别的东西,我没有看到

Is it because of the CPU caching variables? Or maybe the overhead from calling these methods repeatedly added up? Or maybe the JIT handling the latter case better than the former? Or maybe there's something else I'm not seeing?

任何见解将不胜感激。

由于@MatthewWatson建议,我用了一个秒表时间发布版本的调试器之外。为了摆脱噪音,我多次跑测试。其结果是,前者代码将〜21秒。(20.7和20.9之间)来完成,而后者只〜19秒。(19和19.2之间)。

的差别已经变得微不足道,但它仍然是存在的。

As @MatthewWatson suggested, I used a StopWatch to time release builds outside of the debugger. In order to get rid of noise, I ran the tests multiple times. As a result, the former code takes ~21 seconds (between 20.7 and 20.9) to finish, whereas the latter only ~19 seconds (between 19 and 19.2).
The difference has become negligible, but it is still there.

推荐答案

我愿意打赌,原来的代码是因为在C#中涉及型结构的属性怪癖这么慢得多。它不完全直观,但这种类型的属性是天生就慢。为什么?由于结构不是按引用传递。因此,为了获得 ray.Dir.x ,你必须

Introduction

I'd be willing to bet that the original code is so much slower because of a quirk in C# involving properties of type structs. It's not exactly intuitive, but this type of property is inherently slow. Why? Because structs are not passed by reference. So in order to access ray.Dir.x, you have to


  1. 加载局部变量射线

  2. 呼叫 get_Dir ,结果存储在临时变量。这涉及到复制整个结构,即使只有场'X'是使用过。

  3. 接入领域 X 从临时副本

  1. Load local variable ray.
  2. Call get_Dir and store the result in a temporary variable. This involves copying the entire struct, even though only the field 'x' is ever used.
  3. Access field x from the temporary copy.

综观原代码,将get访问被称为18倍。这是一个巨大的浪费,因为这意味着整个结构的整体复制18次。在你优化的代码,只有两个副本 - 目录平面都只能调用一次;该值进一步访问仅由来自上述三步骤的:

Looking at the original code, the get accessors are called 18 times. This is a huge waste, because it means that the entire struct is copied 18 times overall. In your optimized code, there are only two copies - Dir and Pos are both called only once; further access to the values only consist of the third step from above:


  1. 接入领域 X 从临时副本。

  1. Access field x from the temporary copy.

要概括起来讲,结构和性质不走在一起。

To sum it up, structs and properties do not go together.

这事做的事实,在C#中,结构是值类型。要传递围绕价值本身,而不是一个指针的值。

It has something to do with the fact that in C#, structs are value types. You are passing around the value itself, rather than a pointer to the value.

在调试模式下,像这样的优化跳过,以提供更好的体验debegging。即使在释放模式,你会发现,最紧张不经常这样做。我不知道为什么,但我相信这是因为这一领域并不总是字对齐。现代的CPU有奇怪的性能要求。 : - )

In debug mode, optimizations like this are skipped to provide for a better debegging experience. Even in release mode, you'll find that most jitters don't often do this. I don't know exactly why, but I believe it is because the field is not always word-aligned. Modern CPUs have odd performance requirements. :-)

这篇关于为什么一个简单的GET语句这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆