C程序确定级别缓存大小 [英] C Program to determine Levels & Size of Cache

查看:14
本文介绍了C程序确定级别缓存大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了清晰起见,完全重写/更新(和你的理智,它有点太长了)......(旧帖子)

对于作业,我需要找到每个缓存的级别(L1、L2、...)和大小.给出提示和我到目前为止发现的内容:我认为这个想法是创建不同大小的数组并读取它们.为这些操作计时:

For an assignment, I need to find the levels (L1,L2,...) and size of each cache. Given hints and what I found so far: I think the idea is to create arrays of different sizes and read them. Timing these operations:

sizes = [1k, 4k, 256K, ...]
foreach size in sizes 
    create array of `size`

    start timer
    for i = 0 to n // just keep accessing array
        arr[(i * 16) % arr.length]++ // i * 16 supposed to modify every cache line ... see link
    record/print time

更新(UTC+8 9 月 28 日下午 6:57)

另见完整来源

See also full source

好的,现在按照@mah 的建议,我可能已经解决了 SNR 比问题......并且还找到了一种为我的代码计时的方法(wall_clock_time 来自实验室示例代码)

Ok now following @mah's advice, I might have fixed the SNR ratio problem ... and also found a method of timing my code (wall_clock_time from a lab example code)

但是,我似乎得到了不正确的结果:我使用的是 Intel Core i3 2100:[规格]

However, I seem to be getting incorrect results: I am on a Intel Core i3 2100: [SPECS]

  • L1:2 x 32K
  • L2:2 x 256K
  • L3:3MB

我得到的结果,在图表中:

The results I got, in a graph:

lengthMod:1KB 到 512K

lengthMod: 1KB to 512K

第一个峰值的基数是 32K ......合理......第二个是 384K ......为什么?我期待 256?

The base of the 1st peak is 32K ... reasonable ... the 2nd is 384K ... why? I'm expecting 256?

lengthMod:512k 到 4MB

lengthMod: 512k to 4MB

那为什么这个范围会一团糟?

Then why might this range be in a mess?

我也读过其他应用程序的预取或干扰,所以我在脚本运行时关闭了尽可能多的东西,它总是出现(通过多次运行)1MB及以上的数据总是那么凌乱?

I also read about prefetching or interference from other applications, so I closed as many things as possible while the script is running, it appears consistently (through multiple runs) that the data of 1MB and above is always so messy?

推荐答案

测量你的时间所花费的时间(也就是刚刚调用clock()函数的时间)很多很多(many many many....) 比执行 arr[(i*16)&lengthMod]++ 所需的时间多倍.这种极低的信噪比(以及其他可能的陷阱)使您的计划无法实施.问题的很大一部分是您试图测量循环的单次迭代;您链接的示例代码试图测量一整套迭代(在开​​始循环之前读取时钟;从循环中出现后再次读取;不要在循环内使用 printf()).

The time it takes to measure your time (that is, the time just to call the clock() function) is many many (many many many....) times greater than the time it takes to perform arr[(i*16)&lengthMod]++. This extremely low signal-to-noise ratio (among other likely pitfalls) makes your plan unworkable. A large part of the problem is that you're trying to measure a single iteration of the loop; the sample code you linked is attempting to measure a full set of iterations (read the clock before starting the loop; read it again after emerging from the loop; do not use printf() inside the loop).

如果您的环路足够大,您或许能够克服信噪比问题.

If your loop is large enough you might be able to overcome the signal-to-noise ratio problem.

至于什么元素被递增";arr 是一个 1MB 缓冲区的地址;arr[(i * 16) &lengthMod]++; 导致 (i * 16) * lengthMod 从那个地址产生一个偏移量;该偏移量是递增的 int 的地址.您正在执行移位(i * 16 将变成 i << 4)、逻辑和加法,然后是读/加/写或单个增量,具体取决于您的 CPU).

As to "what element is being incremented"; arr is an address of a 1MB buffer; arr[(i * 16) & lengthMod]++; causes (i * 16) * lengthMod to generate an offset from that address; that offset is the address of the int that gets incremented. You're performing a shift (i * 16 will turn into i << 4), a logical and, an addition, then either a read/add/write or a single increment, depending on your CPU).

如上所述,由于内存访问(缓存或无缓存)的相对速度和调用函数只是为了测量时间,您的代码的 SNR(信噪比)很差.为了获得您当前获得的时间,我假设您将代码修改为如下所示:

As described, your code suffers from a poor SNR (signal to noise ratio) due to the relative speeds of memory access (cache or no cache) and calling functions just to measure the time. To get the timings you're currently getting, I assume you modified the code to look something like:

int main() {
    int steps = 64 * 1024 * 1024;
    int arr[1024 * 1024];
    int lengthMod = (1024 * 1024) - 1;
    int i;
    double timeTaken;
    clock_t start;

    start = clock();
    for (i = 0; i < steps; i++) {
        arr[(i * 16) & lengthMod]++;
    }
    timeTaken = (double)(clock() - start)/CLOCKS_PER_SEC;
    printf("Time for %d: %.12f 
", i, timeTaken);
}

这会将测量移到循环之外,因此您不是在测量单个访问(这实际上是不可能的),而是在测量 steps 访问.

This moves the measurement outside the loop so you're not measuring a single access (which would really be impossible) but rather you're measuring steps accesses.

您可以根据需要随意增加 steps,这将对您的时间安排产生直接影响.由于您收到的时间太接近,在某些情况下甚至倒转(您的时间在大小之间振荡,这不太可能是由缓存引起的),您可以尝试将 steps 的值更改为256 * 1024 * 1024 甚至更大.

You're free to increase steps as needed and this will have a direct impact on your timings. Since the times you're receiving are too close together, and in some cases even inverted (your time oscillates between sizes, which is not likely caused by cache), you might try changing the value of steps to 256 * 1024 * 1024 or even larger.

注意:您可以使 steps 尽可能大,以适应有符号整数(应该足够大),因为符合逻辑并确保您在缓冲区中回绕.

NOTE: You can make steps as large as you can fit into a signed int (which should be large enough), since the logical and ensures that you wrap around in your buffer.

这篇关于C程序确定级别缓存大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆