如何通过IO时序测量来找到L1缓存行大小的大小? [英] How to find the size of the L1 cache line size with IO timing measurements?

查看:192
本文介绍了如何通过IO时序测量来找到L1缓存行大小的大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为学校作业,我需要找到一种方法来获取L1数据高速缓存行大小,而无需读取配置文件或使用api调用。假设使用内存访问读写时序来分析和分析获取此信息。那我该怎么办呢?

As a school assignment, I need to find a way to get the L1 data cache line size, without reading config files or using api calls. Supposed to use memory accesses read/write timings to analyze & get this info. So how might I do that?

在另一部分的不完整尝试中,找到级别&缓存大小,我有:

In an incomplete try for another part of the assignment, to find the levels & size of cache, I have:

for (i = 0; i < steps; i++) {
    arr[(i * 4) & lengthMod]++;
}

我在想,也许我只需要改变第2行,(i * 4)部分?因此,一旦超过缓存行大小,可能需要更换它,这需要花费一些时间?但这是如此简单吗?所需的块可能已经在内存中的某个位置?或者说perpah,我仍然可以依靠以下事实:如果我有足够大的步骤,它仍然可以非常精确地工作?

I was thinking maybe I just need vary line 2, (i * 4) part? So once I exceed the cache line size, I might need to replace it, which takes sometime? But is it so straightforward? The required block might already be in memory somewhere? Or perpahs I can still count on the fact that if I have a large enough steps, it will still work out quite accurately?

更新

在GitHub上尝试 ...下面的主要部分

Heres an attempt on GitHub ... main part below

// repeatedly access/modify data, varying the STRIDE
for (int s = 4; s <= MAX_STRIDE/sizeof(int); s*=2) {
    start = wall_clock_time();
    for (unsigned int k = 0; k < REPS; k++) {
        data[(k * s) & lengthMod]++;
    }
    end = wall_clock_time();
    timeTaken = ((float)(end - start))/1000000000;
    printf("%d, %1.2f \n", s * sizeof(int), timeTaken);
}

问题之间,时间安排似乎并没有太大区别。仅供参考。因为它用于L1缓存。我有SIZE = 32 K(数组大小)

Problem is there dont seem to be much differences between the timing. FYI. since its for L1 cache. I have SIZE = 32 K (size of array)

推荐答案

分配一个大 char 数组(确保它太大,无法放入L1 L2高速缓存中)。用随机数据填充它。

Allocate a BIG char array (make sure it is too big to fit in L1 or L2 cache). Fill it with random data.

n 个字节为步长开始遍历数组。

Start walking over the array in steps of n bytes. Do something with the retrieved bytes, like summing them.

基准测试并计算您可以使用不同的 n值处理每秒多少字节。

Benchmark and calculate how many bytes/second you can process with different values of n, starting from 1 and counting up to 1000 or so. Make sure that your benchmark prints out the calculated sum, so the compiler can't possibly optimize the benchmarked code away.

n ==您的缓存行大小,每次访问将需要在L1缓存中读取新行。因此,基准测试结果应该会在那一点上急剧下降。

When n == your cache line size, each access will require reading a new line into the L1 cache. So the benchmark results should get slower quite sharply at that point.

如果数组足够大,那么到结束时,数据的开始阵列将再次超出缓存,这就是您想要的。因此,在您增加 n 并重新开始之后,结果将不会受到缓存中已存在所需数据的影响。

If the array is big enough, by the time you reach the end, the data at the beginning of the array will already be out of cache again, which is what you want. So after you increment n and start again, the results will not be affected by having needed data already in the cache.

这篇关于如何通过IO时序测量来找到L1缓存行大小的大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆