阵列的缓存未命中率 [英] Cache miss rate of array

查看:122
本文介绍了阵列的缓存未命中率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图弄清楚如何计算数组的未命中率。我有答案,但是我不明白答案是如何得出的。
我有以下代码:

I'm trying to figure out how to calculate the miss rate of an array. I have the answer, but I'm not understanding how the answer was arrived at. I have the following code:

int C[N1][N2];
int A[N1][N3];
int B[N3][N2];

initialize_arrays(A, B, C, N1, N2, N3);

for(i=0; i<N1; ++i)
   for(j=0; j<N2; ++j)
      for(k=0; k<N3, ++k)
         C[i][j] += A[i][k] * B[k][j];

我也有以下信息: N1 = N2 = N3 = 2048 (这是什么意思?)。该处理器的L1数据高速缓存为32kB,线大小为64B(无L2高速缓存)。 (行大小是多少?)

I also have the following info: N1=N2=N3=2048 (what does this mean??). The processor has an L1 data cache of 32kB with line size of 64B (no L2 cache). (what is line size?)

我知道数组C的未命中率为(N ^ 2/16)/ N ^ 3 。我知道公式是(总未命中)/(总访问量)。我看到总共有 N ^ 3 次访问,但是它们是如何得到总未命中次数的?

I know the miss rate of array C is (N^2/16)/N^3. I know the formula is (total misses)/(total accesses). I see that there are N^3 total accesses, but how did they get the total misses?

我也知道数组B的未命中率:(N ^ 3)/(N ^ 3)和A:(N ^ 2/16)/ N ^ 3

I also know the miss rate of array B: (N^3)/(N^3) and A: (N^2/16)/N^3. Could someone please explain to me how they got the total misses here too?

推荐答案

始终以以下粒度访问/管理缓存:一条线。这里的行大小是64B。这意味着在一条高速缓存行中,一条高速缓存行中将有16个元素(64B / 4B = 16)。接下来要注意的是,第17个元素都在新行中,一旦将某行缓存到缓存中,它就有可能产生15次匹配。

Cache is always accessed / managed with the granularity of a line. Here the line size is 64B. That means that in one cache line there will be 16 elements in one cache line (64B / 4B = 16). Next thing that is important here is every 17th element is in a new line, and once a line is brought to cache, it can possibly give 15 hits.

已经解决了这个问题,让我们看一下访问模式。
A [0] [0],B [0] [0],C [0] [0],A [0] [1],B [1] [0],C [0] [0 ],A [0] [2],B [2] [0],C [0] [0],...... A [0] [16],B [16] [0],C [ 0] [0],A [0] [17],B [17] [0],C [0] [0],......等等

Having got that out of the way, let us look at the access pattern. A[0][0], B[0][0], C[0][0], A[0][1], B[1][0], C[0][0], A[0][2], B[2][0], C[0][0], ...... A[0][16], B[16][0], C[0][0], A[0][17], B[17][0], C[0][0], ...... and so on

现在,由于一行中有16个元素,因此可以安全地假定元素在存储器中的行主要位置A [0] [0]-A [0] [15]将属于第一条高速缓存行(我们称其为AL1),并类似地让B [0] [0]-B [0] [15]属于BL1行。根据这些信息,我们可以根据缓存行来编写访问模式。
AL1,BL1,CL1,AL1,BL129,CL1,AL1,BL257,CL1,..... AL1,BL2049,CL1,AL2,BL2176,CL1,....

Now Since a line has 16 elements and it is safe to assume a row-major placement of elements in the memory A[0][0] - A[0][15] will belong to one the first cache line (let us call this AL1) and similarly let B[0][0] - B[0][15] belong to line BL1. From this information we can write the access pattern in terms of cache lines. AL1, BL1, CL1, AL1, BL129, CL1, AL1, BL257, CL1, ..... AL1, BL2049, CL1, AL2, BL2176, CL1,.... and so on.

现在,缓存大小为32kB,这意味着缓存可以容纳512行。由于这是L1,因此我们假设它是双向关联缓存。因此,有256套。这意味着在数组的每256行之后,第257行映射到与第1行相同的集合。

Now the cache size is 32kB that means the cache can hold 512 lines. Since this is L1, let us assume it is a two-way associative cache. Therefore there are 256 sets. This means that after every 256 lines of an array, the 257th line maps to the same set as the 1st line.

访问设置为1的行的缓存内容如下所示:

The cache contents upon access to lines mapping to set 1 looks like this

第一次迭代-内部

Access to - AL1 - Miss
 AL1 
Access to - BL1 - Miss
 AL1
 BL1
Access to - CL1 - Miss
 CL1
 BL1

第二次迭代-内部

Access to - AL1 - Miss
 CL1
 AL1
Access to - CL1 - Hit
 CL1
 AL1

第三次迭代-内部

Access to - AL1 - Hit
 CL1
 AL1
Access to - BL257 - Miss
 BL257
 AL1
Access to - CL1 - Miss
 BL257
 CL1

第4次迭代-内部

Access to - AL1 - Miss
 AL1
 CL1
Access to - CL1 - Hit
 AL1
 CL1

第5次迭代-内部

Access to - AL1 - Hit
 CL1
 AL1
Access to - BL513 - Miss
 BL513
 AL1
Access to - CL1 - Miss
 BL513
 CL1



. . .

第16次迭代-内部

Access to - AL1 - Miss
 AL1
 CL1
Access to - CL1 - Hit
 AL1
 CL1

第17次迭代-内部

Access to - BL2049 - Miss
 BL2049
 CL1
Access to - CL1 - Hit
 BL2049
 CL1

第18次迭代-内部

Access to - CL1 - Hit
 BL2049
 CL1

对于中间循环的第一次迭代。对于集合1,

For the 1st iteration of the middle loop. We have for set 1,

A -> M, M, H, M, H, M, H, M, H, M, H, M, H , M, H, M, ~ , ~ , ~ , ~ , ....... 
=> after 2048 iterations - 7 hits 9 misses, 16 accesses

B -> M, ~ , M, ~ ,  M, ~ , ........
=> after 2048 iterations - 0 Hits, 1024 misses, 1024 accesses

C -> M, H, M, H, M, H, M, H, M, H, M, H, M, H, M, H, H , H, H , H, ......
=> after 2048 iterations - 2040 hits, 8 misses, 2048 accesses

对于中间循环的第2次至第16次迭代,

for 2nd - 16th iteration of the middle loop,

Exact hit / miss / access pattern as before.

中间循环的第17-2048次迭代,

for 17th - 2048th iteration of the middle loop,

A -> same as before
B -> No accesses
C -> No accesses

总而言之,对于外循环的1次迭代,我们具有集合1,

To summarize - for 1 iteration of the outer loop, we have for set 1,

A -> N*9 misses , N * 16 accesses
B -> N/2 * 16 Misses , N/2 * 16 accesses
C -> 8 * 16 Misses , N * 16 accesses

在外循环中,我们可以看到每个替代迭代与数组C和A的第一次迭代具有相同的命中/未命中/访问模式。外部循环的每次迭代将具有与数组B的第一次迭代相同的命中/未命中/访问模式。

In the outer loop, we can see that every alternate iteration will have the same hit / miss / access pattern as the 1st iteration (summarized above) for arrays C and A. Every iteration of the outer loop will have the same hit / miss / access pattern a the 1st iteration for array B.

因此,(最后!)

对于Set 1,在程序结束时,我们有

For Set 1, at the end of the program, we have

A -> N^2 * 9 / 2 misses, N^2 * 8 accesses
B -> N^2 * 8 misses, N^2 * 8 accesses
C -> N * 64 misses, N^2 * 8 accesses

所有组的访问方式均相似。因此,到程序结束时,我们在所有集合中都有

The access pattern will be similar across all sets. Therefore by the end of the program we have, across all sets

A -> N^2 * 1152 misses, N^3 accesses
B -> N^3 misses, N^3 accesses
C -> N^2 * 8 misses, N^3 accesses

我知道这与指示中的不同问题。我不知道如何。我很高兴听到其他回应。

I know this is different from what is indicated in the question. I could not figure out how. I will be glad to hear other responses.

这篇关于阵列的缓存未命中率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆