CLFLUSH不冲水指令缓存 [英] clflush not flushing the instruction cache

查看:1921
本文介绍了CLFLUSH不冲水指令缓存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下code段:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#define ARRAYSIZE(arr) (sizeof(arr)/sizeof(arr[0]))


inline void
clflush(volatile void *p)
{
    asm volatile ("clflush (%0)" :: "r"(p));
}

inline uint64_t
rdtsc()
{
    unsigned long a, d;
    asm volatile ("cpuid; rdtsc" : "=a" (a), "=d" (d) : : "ebx", "ecx");
    return a | ((uint64_t)d << 32);
}

inline int func() { return 5;}

inline void test()
{
    uint64_t start, end;
    char c;
    start = rdtsc();
    func();
    end = rdtsc();
    printf("%ld ticks\n", end - start);
}

void flushFuncCache()
{
    // Assuming function to be not greater than 320 bytes.
    char* fPtr = (char*)func;
    clflush(fPtr);
    clflush(fPtr+64);
    clflush(fPtr+128);
    clflush(fPtr+192);
    clflush(fPtr+256);
}

int main(int ac, char **av)
{
    test();
    printf("Function must be cached by now!\n");
    test();
    flushFuncCache();
    printf("Function flushed from cache.\n");
    test();
    printf("Function must be cached again by now!\n");
    test();

    return 0;
}

在这里,我试图刷新指令缓存中删除code代表'功能',然后期望到FUNC下次调用性能开销,但我的成绩不同意我的期望:

Here, i am trying to flush the instruction cache to remove the code for 'func', and then expecting a performance overhead on the next call to func but my results don't agree to my expectations:

858 ticks
Function must be cached by now!
788 ticks
Function flushed from cache.
728 ticks
Function must be cached again by now!
710 ticks

我期待 CLFLUSH 也刷新指令缓存,但显然,这是不这样做。有人可以解释这种现象或建议如何实现所需的行为。

I was expecting CLFLUSH to also flush the instruction cache, but apparently, it is not doing so. Can someone explain this behavior or suggest how to achieve the desired behavior.

推荐答案

您code确实在几乎没有FUNC ,和你做一点被联到测试,而且很可能优化掉了,因为你永远不使用返回值。

Your code does almost nothing in func, and the little you do gets inlined into test, and probably optimized out since you never use the return value.

GCC -O3给我 -

gcc -O3 gives me -

0000000000400620 <test>:
  400620:       53                      push   %rbx
  400621:       0f a2                   cpuid
  400623:       0f 31                   rdtsc
  400625:       48 89 d7                mov    %rdx,%rdi
  400628:       48 89 c6                mov    %rax,%rsi
  40062b:       0f a2                   cpuid
  40062d:       0f 31                   rdtsc
  40062f:       5b                      pop    %rbx
  ...

所以你测量时间为两个动作是非常便宜的HW-明智 - 您的测试是可能显示 CPUID 这是相对昂贵的..潜伏期

So you're measuring time for the two moves that are very cheap HW-wise - your measurement is probably showing the latency of cpuid which is relatively expensive..

更糟的是,你的 CLFLUSH 实际上刷新测试为好,这意味着你付出再取出点球时,你下次访问它,这是出了 RDTSC 对,所以它不是衡量。在另一方面,测量code,按顺序执行,所以取测试大概也获取刷新code你测量,所以它可能实际上被缓存到时候你衡量它。

Worse, your clflush would actually flush test as well, this means you pay the re-fetch penalty when you next access it, which is out of the rdtsc pair so it's not measured. The measured code on the other hand, sequentially follows, so fetching test would probably also fetch the flushed code you measure, so it could actually be cached by the time you measure it.

这篇关于CLFLUSH不冲水指令缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆