如何提出高缓存未命中率的示例? [英] How to come up with a high cache miss rate example?

查看:61
本文介绍了如何提出高缓存未命中率的示例?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试提出一个示例程序,该程序具有较高的缓存丢失率.我以为可以尝试像这样逐列访问矩阵:

I'm trying to come up with an example program which would have a high cache-miss rate. I thought I could try accessing a matrix column by column like so:

#include <stdlib.h>

int main(void)
{
    int i, j, k;

    int w = 1000;
    int h = 1000;

    int **block = malloc(w * sizeof(int*));
    for (i = 0; i < w; i++) {
        block[i] = malloc(h * sizeof(int));
    }

    for (k = 0; k < 10; k++) {
        for (i = 0; i < w; i++) {
            for (j = 0; j < h; j++) {
                block[j][i] = 0;
            }
        }
    }

    return 0;
}

当我使用 -O0 标志进行编译并使用 perf stat -r 5 -B -e cache-references,cache-misses ./a.out 运行时给我:

when I compile this with -O0 flag and run using perf stat -r 5 -B -e cache-references,cache-misses ./a.out it gives me:

 Performance counter stats for './a.out' (5 runs):

    715,463 cache-references                                      ( +-  0.42% )
    527,634 cache-misses          #   73.747 % of all cache refs  ( +-  2.53% )

0.112001160 seconds time elapsed                                  ( +-  1.58% )

这对我来说已经足够了.但是,如果我继续将矩阵大小更改为 2000x2000 ,它会给出:

which is good enough for my purposes. However if I go ahead and change the matrix size to 2000x2000 it gives:

 Performance counter stats for './a.out' (5 runs):

  6,364,995 cache-references                                      ( +-  2.32% )
  2,534,989 cache-misses          #   39.827 % of all cache refs  ( +-  0.02% )

0.461104903 seconds time elapsed                                  ( +-  0.92% )

,如果我将其进一步提高到 3000x3000 ,我会得到:

and if I increase it even further to 3000x3000 I get:

 Performance counter stats for './a.out' (5 runs):

 59,204,028 cache-references                                      ( +-  1.36% )
  5,662,629 cache-misses          #    9.565 % of all cache refs  ( +-  0.11% )

1.116573625 seconds time elapsed                                  ( +-  0.32% )

这很奇怪,因为随着大小的增加,我希望获得更高的缓存未命中率.我需要尽可能独立于平台的内容.计算机体系结构课程很早以前就开始了,因此欢迎您提供任何见识.

which is strange because I would expect to get more cache miss rate as the size increases. I need something that will be as platform independent as possible. computer architecture class was long ago so any insight would be welcomed..

注释

我说我需要相对平台无关的东西,但这些仍然是我的规格:

I said I need something relatively platform independent but still these are my specs:

  • 英特尔®酷睿™i5-2467M
  • 4 GiB RAM
  • 64位Ubuntu 12.04

推荐答案

提防现代CPU中的自动预取-它通常可以检测跨步访问.也许尝试使用随机访问模式,例如:

Beware of automatic prefetch in modern CPUs - it can often detect strided accesses. Perhaps try a random access pattern, e.g.:

int main(void)
{
    int i;

    int n = 1000 * 1000;

    int *block = malloc(n * sizeof(int));

    for (i = 0; i < n / 10; i++) {
         int ri = rand() % n;
         block[ri] = 0;
    }

    return 0;
}

这篇关于如何提出高缓存未命中率的示例?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆