为什么本地缓存为重要阵列性能? [英] Why does cache locality matter for array performance?
问题描述
在以下博客有一个关于阵列的优势在声明链接列表:
In the following blog there is a statement about the advantage of arrays over linked lists:
阵列有更好的本地高速缓存,可以使性能的pretty很大的区别。
Arrays have better cache locality that can make a pretty big difference in performance.
这是什么意思?我不知道如何高速缓存位置可以提供巨大的性能优势。
What does that mean? I don't understand how cache locality can provide a huge performance benefit.
推荐答案
请参阅关于空间和时间局部性我的回答。
See my answer about spatial and temporal locality.
在具体地,阵列是连续存储器块,其中,以便大块将被装入时首先访问高速缓存。这使得它比较快速访问数组的元素的未来。另一方面链表不一定是连续的内存块,并且可能会导致更多的高速缓存未命中,从而增加它需要访问他们的时间。
In particular, arrays are contiguous memory blocks, so large chunks of them will be loaded into the cache upon first access. This makes it comparatively quick to access future elements of the array. Linked lists on the other hand aren't necessarily in contiguous blocks of memory, and could lead to more cache misses, which increases the time it takes to access them.
考虑为一个数组以下可能的内存布局数据
和链接列表 l_data code>大型结构的
Consider the following possible memory layouts for an array data
and linked list l_data
of large structs
Address Contents | Address Contents
ffff 0000 data[0] | ffff 1000 l_data
ffff 0040 data[1] | ....
ffff 0080 data[2] | ffff 3460 l_data->next
ffff 00c0 data[3] | ....
ffff 0100 data[4] | ffff 8dc0 l_data->next->next
| ffff 8e00 l_data->next->next->next
| ....
| ffff 8f00 l_data->next->next->next->next
如果我们通过这个数组要循环,第一次访问 FFFF 0000
需要我们去记忆检索(以CPU周期一个非常缓慢的操作)。然而,第一接入后阵列的其余部分将在高速缓存中,并且随后的访问将会快得多。随着链表,对第一次访问 FFFF 1000
将也需要我们去记忆。不幸的是,该处理器将缓存内存中直接围绕这个位置,说一路攀升至 FFFF 2000
。正如你所看到的,这实际上并没有捕捉到任何的列表,这意味着其他元素,当我们去访问 l_data->接下来
,我们再会必须到内存中。
If we wanted to loop through this array, the first access to ffff 0000
would require us to go to memory to retrieve (a very slow operation in CPU cycles). However, after the first access the rest of the array would be in the cache, and subsequent accesses would be much quicker. With the linked list, the first access to ffff 1000
would also require us to go to memory. Unfortunately, the processor will cache the memory directly surrounding this location, say all the way up to ffff 2000
. As you can see, this doesn't actually capture any of the other elements of the list, which means that when we go to access l_data->next
, we will again have to go to memory.
这篇关于为什么本地缓存为重要阵列性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!