C程序确定级别缓存大小 [英] C Program to determine Levels & Size of Cache
问题描述
为了清晰起见,完全重写/更新(和你的理智,它有点太长了)......(旧帖子)
对于作业,我需要找到每个缓存的级别(L1、L2、...)和大小.给出提示和我到目前为止发现的内容:我认为这个想法是创建不同大小的数组并读取它们.为这些操作计时:
For an assignment, I need to find the levels (L1,L2,...) and size of each cache. Given hints and what I found so far: I think the idea is to create arrays of different sizes and read them. Timing these operations:
sizes = [1k, 4k, 256K, ...]
foreach size in sizes
create array of `size`
start timer
for i = 0 to n // just keep accessing array
arr[(i * 16) % arr.length]++ // i * 16 supposed to modify every cache line ... see link
record/print time
更新(UTC+8 9 月 28 日下午 6:57)
另见完整来源
See also full source
好的,现在按照@mah 的建议,我可能已经解决了 SNR 比问题......并且还找到了一种为我的代码计时的方法(wall_clock_time
来自实验室示例代码)
Ok now following @mah's advice, I might have fixed the SNR ratio problem ... and also found a method of timing my code (wall_clock_time
from a lab example code)
但是,我似乎得到了不正确的结果:我使用的是 Intel Core i3 2100:[规格]
However, I seem to be getting incorrect results: I am on a Intel Core i3 2100: [SPECS]
- L1:2 x 32K
- L2:2 x 256K
- L3:3MB
我得到的结果,在图表中:
The results I got, in a graph:
lengthMod:1KB 到 512K
lengthMod: 1KB to 512K
第一个峰值的基数是 32K ......合理......第二个是 384K ......为什么?我期待 256?
The base of the 1st peak is 32K ... reasonable ... the 2nd is 384K ... why? I'm expecting 256?
lengthMod:512k 到 4MB
lengthMod: 512k to 4MB
那为什么这个范围会一团糟?
Then why might this range be in a mess?
我也读过其他应用程序的预取或干扰,所以我在脚本运行时关闭了尽可能多的东西,它总是出现(通过多次运行)1MB及以上的数据总是那么凌乱?
I also read about prefetching or interference from other applications, so I closed as many things as possible while the script is running, it appears consistently (through multiple runs) that the data of 1MB and above is always so messy?
推荐答案
测量你的时间所花费的时间(也就是刚刚调用clock()函数的时间)很多很多(many many many....) 比执行 arr[(i*16)&lengthMod]++
所需的时间多倍.这种极低的信噪比(以及其他可能的陷阱)使您的计划无法实施.问题的很大一部分是您试图测量循环的单次迭代;您链接的示例代码试图测量一整套迭代(在开始循环之前读取时钟;从循环中出现后再次读取;不要在循环内使用 printf()).
The time it takes to measure your time (that is, the time just to call the clock() function) is many many (many many many....) times greater than the time it takes to perform arr[(i*16)&lengthMod]++
. This extremely low signal-to-noise ratio (among other likely pitfalls) makes your plan unworkable. A large part of the problem is that you're trying to measure a single iteration of the loop; the sample code you linked is attempting to measure a full set of iterations (read the clock before starting the loop; read it again after emerging from the loop; do not use printf() inside the loop).
如果您的环路足够大,您或许能够克服信噪比问题.
If your loop is large enough you might be able to overcome the signal-to-noise ratio problem.
至于什么元素被递增";arr
是一个 1MB 缓冲区的地址;arr[(i * 16) &lengthMod]++;
导致 (i * 16) * lengthMod
从那个地址产生一个偏移量;该偏移量是递增的 int 的地址.您正在执行移位(i * 16 将变成 i << 4)、逻辑和加法,然后是读/加/写或单个增量,具体取决于您的 CPU).
As to "what element is being incremented"; arr
is an address of a 1MB buffer; arr[(i * 16) & lengthMod]++;
causes (i * 16) * lengthMod
to generate an offset from that address; that offset is the address of the int that gets incremented. You're performing a shift (i * 16 will turn into i << 4), a logical and, an addition, then either a read/add/write or a single increment, depending on your CPU).
如上所述,由于内存访问(缓存或无缓存)的相对速度和调用函数只是为了测量时间,您的代码的 SNR(信噪比)很差.为了获得您当前获得的时间,我假设您将代码修改为如下所示:
As described, your code suffers from a poor SNR (signal to noise ratio) due to the relative speeds of memory access (cache or no cache) and calling functions just to measure the time. To get the timings you're currently getting, I assume you modified the code to look something like:
int main() {
int steps = 64 * 1024 * 1024;
int arr[1024 * 1024];
int lengthMod = (1024 * 1024) - 1;
int i;
double timeTaken;
clock_t start;
start = clock();
for (i = 0; i < steps; i++) {
arr[(i * 16) & lengthMod]++;
}
timeTaken = (double)(clock() - start)/CLOCKS_PER_SEC;
printf("Time for %d: %.12f
", i, timeTaken);
}
这会将测量移到循环之外,因此您不是在测量单个访问(这实际上是不可能的),而是在测量 steps
访问.
This moves the measurement outside the loop so you're not measuring a single access (which would really be impossible) but rather you're measuring steps
accesses.
您可以根据需要随意增加 steps
,这将对您的时间安排产生直接影响.由于您收到的时间太接近,在某些情况下甚至倒转(您的时间在大小之间振荡,这不太可能是由缓存引起的),您可以尝试将 steps
的值更改为256 * 1024 * 1024
甚至更大.
You're free to increase steps
as needed and this will have a direct impact on your timings. Since the times you're receiving are too close together, and in some cases even inverted (your time oscillates between sizes, which is not likely caused by cache), you might try changing the value of steps
to 256 * 1024 * 1024
or even larger.
注意:您可以使 steps
尽可能大,以适应有符号整数(应该足够大),因为符合逻辑并确保您在缓冲区中回绕.
NOTE: You can make steps
as large as you can fit into a signed int (which should be large enough), since the logical and ensures that you wrap around in your buffer.
这篇关于C程序确定级别缓存大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!