一个循环执行两个动作VS两个循环执行一个动作? [英] One Loop with two actions VS two loops with one action performance?

查看:292
本文介绍了一个循环执行两个动作VS两个循环执行一个动作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对这个新手问题很感兴趣.

I have long time interested in this newbie question.

例如,我们有两种情况:

For example we have two situations:

  1. 我有一个带有两个功能的任意循环. (WhileFor无关紧要.)

for (int i = 0; i < 1000; i++)
{
    Function_1();
    Function_2();
}

  • 我有两个循环,每个循环只有一个动作.

  • I have two loops with one action each.

    for (int i = 0; i < 1000; i++)
    {
        Function_1();
    }
    
    for (int i = 0; i < 1000; i++)
    {
        Function_2();
    }
    

  • 我了解首先可以更快地工作.

    I understand that first will work faster.

    但这两种情况之间的性能速度有何不同? (以百分比为单位)

    But what difference in performance speed between this two situation? (in percentage)

    如果最大循环数会增加多少性能会降低?

    How much performance decrease if max loop count will increase?

    在这种情况下,什么(处理器或RAM)承担更大的负担?

    And what (processor or RAM) takes on a greater load in this situations?

    推荐答案

    从纯粹的理论角度来看,两者之间没有区别.无论哪种方式,它都是O(N),到此为止.

    From a purely theoretical viewpoint, there's no difference between the two. Either way, it's O(N), and that's the end of that.

    从更实际的角度来看,缓存可以在很大程度上改变这个简单的答案.一个可能比另一个更有效地使用缓存.不过,不一定会是您显示的第一个.

    From a more practical viewpoint, caching can change that simple answer quite a bit. One may use the cache considerably more effectively than the other. That won't necessarily be the first one you've shown though.

    在实际的(现代)计算机上,它基本上是一个问题,即哪个问题可以更有效地利用缓存.

    On a real (modern) computer, it basically works out as a question of which makes more effective use of the cache.

    这又取决于Function_1Function_2各自使用哪种类型的内存.如果Function_1和Function_2每个都涉及执行大量代码,那么它们每个都将适合L1指令高速缓存,但是它们两个都不合,那么第二个版本可能会更快.在这种情况下,第一个版本(在两个功能之间交替)每次执行时都必须从主存储器中加载每个功能,因此您需要从主存储器中加载代码约2000次.对于第二个代码,您需要从内存中加载一次Function_1的代码,然后从缓存中执行1000次,然后对Function_2执行相同的操作.来自主内存的总共2次加载.

    That, in turn, depends on how much of what kind of memory is used by each of Function_1 and Function_2. If Function_1 and Function_2 each involve executing quite a lot of code, so each of them will fit in the L1 instruction cache, but both of them together won't, then the second version may well be faster. In this circumstance, the first version (alternating between the two functions) will have to load each function from main memory each time it executes, so you load the code from main memory ~2000 times. With the second one, you load the code for Function_1 from memory once, execute it 1000 times from the cache, then do the same with Function_2. A total of 2 loads from main memory.

    在另一个方向上,我们假设Function_1和Function_2的 code 都可以放入指令缓存中,但是Function_1和Function_2都对相同的 data 操作,而该数据的总数太大,无法容纳在数据缓存中.

    In the other direction, let's assume that the code for both Function_1 and Function_2 can all fit in the instruction cache, but both Function_1 and Function_2 operate on the same data, and the total of that data is too large to fit in the data cache.

    这通常会扭转这种情况:在一个数据块上执行Function_1,然后在同一数据块上执行Function_2,只会从内存中加载该数据一次,然后对其进行所有必要的计算,然后加载下一个块数据等等.每个数据块仅从主存储器加载一次.

    This will typically reverse the situation: executing Function_1 on a block of data, then executing Function_2 on the same block of data will only load that data from memory once, then do all the necessary computation on it, then load the next block of data, and so on. Each block of data is loaded from main memory only once.

    在这种情况下,第二个版本的代码可能会慢2倍左右-它会加载一个内存块并在其上执行Function_1.然后,它将加载第二个内存块,并在其上执行Function_1,依此类推.用Function_1处理完所有内存后,它将返回并加载所有相同的内存块以在其上执行Function_2.

    In this case, the second version of your code may be slower by a factor of about 2--it'll load a block of memory and execute Function_1 on it. Then it'll load a second block of memory, and execute Function_1 on that, and so on. Once all the memory has been processed with Function_1, it'll go back through and load all the same blocks of memory to execute Function_2 on them.

    整个领域都在研究诸如缓存感知和缓存遗忘算法之类的东西,以帮助您针对像您这样的情况做出明智的选择.缓存感知排序基于上述模型(的更详细版本),选择如何组织计算以适合缓存组织.忽略缓存的算法更多地针对一种相对通用的缓存模型,并且几乎不管确切地组织特定缓存的方式如何,都可以提供良好的性能.

    There's an entire area of research into things like cache aware and cache oblivious algorithms to help make intelligent choices for cases like yours. The cache aware sorts are based on (more detailed versions of) models like above, choosing how to organize calculations to fit cache organization. Cache oblivious algorithms aim more toward a relatively generic model of caching, and providing good performance almost regardless of exactly how a specific cache is organized.

    这篇关于一个循环执行两个动作VS两个循环执行一个动作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆