编写一个程序来获得(L1)高速缓存行大小 [英] Writing a program to get (L1) cache line size

查看:724
本文介绍了编写一个程序来获得(L1)高速缓存行大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为学校作业,我需要找到一种方式来获得L1数据高速缓存行的大小,无需读取配置文件或使用API​​调用。应该使用内存访问读/写时序分析和放大器;得到这个信息。所以,我怎么可能这样做吗?

As a school assignment, I need to find a way to get the L1 data cache line size, without reading config files or using api calls. Supposed to use memory accesses read/write timings to analyze & get this info. So how might I do that?

在一个不完整的拉升分配的另一部分,找到水平和放大器;缓存大小,我有:

In an incomplete try for another part of the assignment, to find the levels & size of cache, I have:

for (i = 0; i < steps; i++) {
    arr[(i * 4) & lengthMod]++;
}

我想也许我只需要改变线条2,(我* 4)部分?所以一旦我超过的高速缓存行的大小,我可能需要更换,这需要一段时间?但它是如此简单?所需的块可能已在内存的某个地方?或perpahs我还是可以指望的事实,如果我有一个足够大的步骤,它仍然会制定出相当准确?

I was thinking maybe I just need vary line 2, (i * 4) part? So once I exceed the cache line size, I might need to replace it, which takes sometime? But is it so straightforward? The required block might already be in memory somewhere? Or perpahs I can still count on the fact that if I have a large enough steps, it will still work out quite accurately?

更新

继承人在GitHub上 ...主要部分如下

Heres an attempt on GitHub ... main part below

// repeatedly access/modify data, varying the STRIDE
for (int s = 4; s <= MAX_STRIDE/sizeof(int); s*=2) {
    start = wall_clock_time();
    for (unsigned int k = 0; k < REPS; k++) {
        data[(k * s) & lengthMod]++;
    }
    end = wall_clock_time();
    timeTaken = ((float)(end - start))/1000000000;
    printf("%d, %1.2f \n", s * sizeof(int), timeTaken);
}

问题是有不似乎是定时之间多少差异。仅供参考。因为其L1缓存。我有SIZE = 32 K(数组大小)

Problem is there dont seem to be much differences between the timing. FYI. since its for L1 cache. I have SIZE = 32 K (size of array)

推荐答案

分配一个BIG 字符阵列(确保它太大,以适应在L1的的L2高速缓存)。用随机数据填充它。

Allocate a BIG char array (make sure it is too big to fit in L1 or L2 cache). Fill it with random data.

开始在 N 字节的步骤走的数组。做与检索到的字节的东西,比如对它们求和。

Start walking over the array in steps of n bytes. Do something with the retrieved bytes, like summing them.

基准和计算有多少字节/秒,你可以用的不同值方法N ,从1开始计数达1000左右。确保您的基准打印出计算总和,所以编译器不可能优化基准比较code了。

Benchmark and calculate how many bytes/second you can process with different values of n, starting from 1 and counting up to 1000 or so. Make sure that your benchmark prints out the calculated sum, so the compiler can't possibly optimize the benchmarked code away.

N ==缓存行的大小,的每次访问的需要读取一个新行到L1高速缓存。因此,基准测试结果应该在这一点上得到慢得多急剧下降。

When n == your cache line size, each access will require reading a new line into the L1 cache. So the benchmark results should get slower quite sharply at that point.

如果该数组足够大,通过你到达终点的时候,在数组的开头数据将已经出缓存再次,这是你想要的。你增加打完 N 并重新开始,结果将不会受到在缓存中已经有需要的数据的影响。

If the array is big enough, by the time you reach the end, the data at the beginning of the array will already be out of cache again, which is what you want. So after you increment n and start again, the results will not be affected by having needed data already in the cache.

这篇关于编写一个程序来获得(L1)高速缓存行大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆