是否有一个code,它导致50%的分支prediction小姐? [英] Is there a code that results in 50% branch prediction miss?

查看:98
本文介绍了是否有一个code,它导致50%的分支prediction小姐?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:

我试图找出如何写一个code(C preffered,ASM只有在没有其他解决方案),那会使50%的分支prediction小姐病例

I'm trying to figure out how to write a code (C preffered, ASM only if there is no other solution) that would make the branch prediction miss in 50% of the cases.

所以它是一块code认为是imune编译器相关的分支,也是所有硬件分支prediction不应该去更好的超过50%的优化(掷硬币)。即使是一个更大的挑战是能在多CPU架构运行code和得到同样的50%的失败率。

So it has to be a piece of code that "is imune" to compiler optimizations related to branching and also all the HW branch prediction should not go better than 50% (tossing a coin). Even a greater challenge is being able to run the code on multiple CPU architectures and get the same 50% miss ratio.

我好不容易写一个code,它去的 47%支失败率在x86平台上。我怀疑缺少的可能是从哪里来的3%:

I managed to write a code that goes to 47% branch miss ratio on an x86 platform. I'm suspecting the missing could 3% come from:


  • 计划推出的开销已经在它(尽管很小)的分支

  • 探查开销 - 基本上每个计数器读中断引发所以这可能增加额外的predictable分支

  • 在包含循环和predictable分支后台运行
  • 系统调用

我写我自己的随机数生成器来避免到一个兰特的实现可能有隐藏的predictable分公司的电话。 rdrand 可用时,它也可以使用。延迟并不重要我。

I written my own random number generator to avoid calls to a rand whose implementation might have hidden predictable branches. It can use also rdrand when available. Latency does not matter for me.

的问题:


  1. 我可以做的比我的版本code的更好吗?更好的手段越来越高支小姐predict和相同的结果对于所有的CPU架构。

  2. 这code能 predicated ?什么会意味着什么?

  1. Can I do better than my version of code? Better means getting a higher branch misspredict and same results for all CPU architectures.
  2. Can this code be predicated? What would that mean?

的code:

#include <stdio.h>
#include <time.h>

#define RDRAND
#define LCG_A   1103515245
#define LCG_C   22345
#define LCG_M   2147483648
#define ULL64   unsigned long long

ULL64 generated;

ULL64 rand_lcg(ULL64 seed)
{
#ifdef RDRAND
    ULL64 result = 0;
    asm volatile ("rdrand %0;" : "=r" (result));
    return result;
#else
    return (LCG_A * seed + LCG_C) % LCG_M;
#endif
}

ULL64 rand_rec1()
{
    generated = rand_lcg(generated) % 1024;

    if (generated < 512)
        return generated;
    else return rand_rec1();
}

ULL64 rand_rec2()
{
    generated = rand_lcg(generated) % 1024;

    if (!(generated >= 512))
        return generated;
    else return rand_rec2();
}

#define BROP(num, sum)                  \
    num = rand_lcg(generated);          \
    asm volatile("": : :"memory");      \
    if (num % 2)                        \
        sum += rand_rec1();             \
    else                                \
        sum -= rand_rec2();

#define BROP5(num, sum)     BROP(num, sum) BROP(num, sum) BROP(num, sum) BROP(num, sum) BROP(num, sum)
#define BROP25(num, sum)    BROP5(num, sum) BROP5(num, sum) BROP5(num, sum) BROP5(num, sum) BROP5(num, sum)
#define BROP100(num, sum)   BROP25(num, sum) BROP25(num, sum) BROP25(num, sum) BROP25(num, sum)

int main()
{
    int i = 0;
    int iterations = 500000;    
    ULL64 num = 0;
    ULL64 sum = 0;

    generated = rand_lcg(0) % 54321;

    for (i = 0; i < iterations; i++)
    {
        BROP100(num, sum);
        // ... repeat the line above 10 times
    }

    printf("Sum = %llu\n", sum);
}

更新V1:

随着USR的建议下,我通过改变从一个脚本的命令行参数LCG_C产生的各种图案。 我能去到49.67%,BP小姐。这是足以让我的目的,我不得不产生这样的各种架构的方法。

Following the suggestion of usr, I generated various patterns by varying the LCG_C parameter from the command line in a script. I was able to go to 49.67% BP miss. That is enough for my purpose and I have the methodology to produce this on various architectures.

推荐答案

如果您知道分支predictor是如何工作的,你可以得到100%的错误prediction。只要看看每次predictor的预期prediction并做相反。问题是,我们不知道它是如何实现的。

If you know how the branch predictor works you can get to 100% misprediction. Just take the expected prediction of the predictor each time and do the opposite. The problem is that we don't know how it is implemented.

我已阅读,典型的predictors能够predict patters如 0,1,0,1 等。但我敢肯定还有的格局能维持多久是一个限制。我的建议是尝试给定长度(如4)每个模式,看看哪一个最接近你的目标百分比。您应该能够同时针对50%和100%,非常接近。此配置需要为每个平台做一次或在运行时。

I have read that typical predictors are able to predict patters such as 0,1,0,1 and so on. But I'm sure there is a limit to how long the pattern can be. My suggestion would be to try each and every pattern of a given length (such as 4) and see which one comes closest to your target percentage. You should be able to target both 50% and 100% and come very close. This profiling needs to be done for each platform once or at runtime.

我怀疑,分支机构总数的3%,在系统code像你说的。内核不采取纯粹使用CPU绑定用户code 3%的开销。增加调度优先级到最大

I doubt that 3% of the total number of branches are in system code like you said. The kernel does not take 3% overhead on purely CPU bound user code. Increase the scheduling priority to the maximum.

您可以一次生成随机数据并遍历相同的数据多次采取RNG出局。分支predictor是不可能检测到这种(虽然它显然不能)。

You can take the RNG out of the game by generating random data once and iterating over the same data many times. The branch predictor is unlikely to detect this (although it clearly could).

我想通过填充一个 BOOL [1 LT实现这个; 20] 与零一个图案像我描述。然后,您可以通过它运行下面的多次循环:

I would implement this by filling a bool[1 << 20] with a zero-one pattern like I described. Then, you can run the following loop over it many times:

int sum0 = 0, sum1 = 0;
for (...) {
 //unroll this a lot
 if (array[i]) sum0++;
 else sum1++;
}
//print both sums here to make sure the computation is not being optimized out

您需要检查拆装,以确保编译器没有做任何高明。

You'll need to examine the disassembly to make sure that the compiler did not do anything clever.

我不明白为什么你现在所拥有的复杂的设置是必要的。该RNG可以采取出了问题,我不明白为什么超过需要这种简单的循环。如果编译器玩把戏,你可能需要标记变量挥发性这使编译器。(好:大多数编译器),对待他们,好像他们是外部函数调用

I don't see why the complicated setup that you have right now is necessary. The RNG can be taken out of the question and I don't see why more than this simple loop is needed. If the compiler is playing tricks you might need to mark the variables as volatile which makes the compiler (better: most compilers) treat them as if they were external function calls.

因为现在不再是问题,因为它几乎从不叫你甚至可以调用操作系统的加密RNG摆脱真正的随机数的区分(任何人)号码RNG。

Since the RNG now no longer matters since it is almost never called you can even invoke the cryptographic RNG of your OS to get numbers that are indistinguishable (to any human) from true random numbers.

这篇关于是否有一个code,它导致50%的分支prediction小姐?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆