锈的基准优化出来 [英] Rust benchmark optimized out

查看:114
本文介绍了锈的基准优化出来的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Rust哈希映射获取密钥。我有以下基准:

 #[bench] 
fn rust_get(b:& mut Bencher){
let(hash,keys)=
get_random_hash ::< HashMap< String,usize>>(& HashMap :: with_capacity,& rust_insert_fn);
let mut keys = test :: black_box(keys);
b.iter(|| {
for key.drain(..){
hash.get(& k);
}
}) ;

其中 get_random_hash 是定义为:

pre $ fn get_random_hash< T>(
new:& Fn(usize) - > T,
insert:& Fn(& mut T,String,usize) - >(),
) - > (T,Vec< String>){
let mut keys = Vec :: with_capacity(HASH_SIZE);
let mut hash = new(HASH_CAPACITY);
for i in 0..HASH_SIZE {
let k:String = format!({},Uuid :: new_v4());
keys.push(k.clone());
插入(& mut hash,k,i);
}
return(hash,keys);

rust_insert_fn 是:
$ b $ pre $ f $ rust_insert_fn(map:& mut HashMap< String,usize>,key:String,value:usize){
map.insert(key,value);
}

然而,当我运行基准测试时,测试基准::基准测试:: rust_get ... bench:1 ns / iter(+/- 0)

我认为 test :: black_box可以解决问题,但看起来并不像。我甚至尝试用 test :: black_box`封装for循环中的 hash.get(& k ),但仍然优化了代码。我应该如何正确地让代码在未经优化的情况下运行?



编辑 - 即使以下几点也会优化get操作:



$ p $ lt; code>#[bench]
fn rust_get(b:& mut Bencher){
let(hash,keys)= get_random_hash ::< HashMap< String,usize>>(& HashMap :: with_capacity,& rust_insert_fn);
let mut keys = test :: black_box(keys);
b.iter(|| {
let mut n = 0;
for key.drain(..){
hash.get(& k);
n + = 1;
};
return n;
});
}

有趣的是,以下基准工作:

$ p $ lt; code>#[bench]
fn rust_get_random(b:& mut Bencher){
let(hash,_)= get_random_hash ::< HashMap< String,usize>>(& HashMap :: with_capacity,& rust_insert_fn);
b.iter(|| {
for _ in 0..HASH_SIZE {
hash.get(& format!({},Uuid :: new_v4()));
}
});

$ b#[bench]
fn rust_insert(b:& mut Bencher){
b.iter(|| {
let mut hash = HashMap :: with_capacity(HASH_CAPACITY);
for i in 0..HASH_SIZE {
let k:String = format!({},Uuid :: new_v4());
hash.insert(k,i);
}
});
}

但这也没有:
$ b (b:& mut Bencher){
let(mut hash,keys)= get_random_hash ::< code>#[bench]
fn rust_del HashMap< String,usize>>(& HashMap :: with_capacity,& rust_insert_fn);
let mut keys = test :: black_box(keys);
b.iter(|| {
for key.drain(..){
hash.remove(& k);
};
} );
}

这里是完整的要点。

解决方案

p>编译器优化器是如何工作的?


优化器只不过是一个管道分析和转换。每一个人的分析或转化都相对简单,应用它们的最佳顺序是未知的,一般由启发式方法确定。


这会如何影响我的基准?

基准比较复杂,因为您一般希望测量优化的代码,但同时可能会进行一些分析或转换删除你感兴趣的代码,使基准测试无用。



因此,熟悉所使用的特定优化器的分析和转换过程非常重要。以便能够理解:


  • 哪些是不合要求的,
  • 如何屏蔽它们。



如上所述,大多数通行证都相对简单,因此挫败它们也相对简单。难度在于它们有一百多个这样的事实,你必须知道哪一个人能够击败它。


我遇到了哪些优化?


有几种特定的优化常常与基准一起使用: / p>




什么?优化器如何破坏我的代码?


优化器在所谓的 as-if 规则。这个基本规则允许优化器执行任何不会改变程序输出的转换。也就是说,它不应该改变整个程序的可观察行为



最重要的是,通常明确允许一些更改。最明显的是运行时间预计会缩短,这意味着线程交错可能会有所不同,并且一些语言给予更多的摆动空间。


我用 black_box


什么是 black_box ?这是一个函数,其定义对于优化器来说特别是 opaque 。这对编译器允许执行的优化有一些影响,因为它可能有副作用。因此,这意味着:


  • 转换后的代码必须执行与 black_box 比原始代码

  • 转换后的代码必须按照与传入的参数相同的顺序执行所述调用

  • 不存在假设可以由 black_box



返回的值制作。因此,手术使用 black_box 可以阻止某些优化。但是,盲目使用可能不会阻止正确的使用。


我遇到了哪些优化?

让我们从简单的代码开始:

 #[bench] 
fn rust_get(b:& mut Bencher){
let(hash,mut keys):(HashMap< String,usize> ;, _)=
get_random_hash(& HashMap :: with_capacity ,& rust_insert_fn);

b.iter(|| {
for key.drain(..){
hash.get(& k);
}
});
}

假设是 b.iter ()将迭代所有键,并为它们中的每一个执行 hash.get()


  1. hash.get()的结果未被使用,
  2. hash.get()是一个 pure 函数,这意味着它没有副作用。

因此,这个循环可以改写为:

  b.iter(|| {for k in keys.drain(..){}})

死代码消除(或一些变体):代码没有任何用途,因此它被消除。



它甚至可能是编译器足够聪明认识到keys.drain(..){} 中的k的可以优化为 drop(keys)



然而, black_box 的外科手术应用可以为DCE打下基础:

  b.iter(|| {
for key.drain(..){
black_box(hash.get(安培; K));
}
});

根据上述 black_box 的影响:


  • 循环无法再优化,因为它会将调用次数改为 black_box code>,

  • 每次调用 black_box 必须使用预期的参数执行。



仍有一个可能的障碍:常量传播。特别是,如果编译器意识到所有的键都会产生相同的值,那么它可以优化 hash.get(& k)并将其替换为所述值。



这可以通过混淆密钥来实现: let mut keys = black_box(keys); ,如上所述,或者映射。如果您要对空白地图进行基准测试,则后者将是必要的,在这里它们是平等的。



因此我们得到:
$ b $ (bash& mut Bencher){
let(hash,keys):(HashMap< String,usize> ,_)=
get_random_hash(& HashMap :: with_capacity,& rust_insert_fn);
let mut keys = test :: black_box(keys);

b.iter(|| {
for key.drain(..){
test :: black_box(hash.get(& k));
}
});
}




最后提示。


基准比较复杂,您应该特别小心,只对基准进行基准测试。

在这个特殊情况下,有两个方法调用:


  • keys.drain()
  • hash.get()



由于基准名称暗示,对我而言,您的目标是衡量 get 的表现,我只能假设调用 keys.drain(..)是一个错误。



因此,基准确实应该是:

 #[bench] 
fn rust_get(b:& mut Bencher){
let(hash,keys) :(HashMap< String,usize> ;, _)=
get_random_hash(& HashMap :: with_capacity,& rust_insert_fn);
let keys = test :: black_box(keys);

b.iter(|| {
for& keys {
test :: black_box(hash.get(k));
}
});





$ b

在这种情况下,这更关键,因为闭包传递给 b.iter()预计会多次运行:如果您第一次使用这些密钥,以后会留下什么?一个空的 Vec ...



......实际上这可能只是这里发生的一切;因为 b.iter()运行闭包,直到它的时间稳定下来,它可能只是耗尽了第一个中的 Vec 运行,然后计时一个空循环。


I am trying to benchmark getting keys from a Rust hash map. I have the following benchmark:

#[bench]
fn rust_get(b: &mut Bencher) {
    let (hash, keys) =
        get_random_hash::<HashMap<String, usize>>(&HashMap::with_capacity, &rust_insert_fn);
    let mut keys = test::black_box(keys);
    b.iter(|| {
        for k in keys.drain(..) {
            hash.get(&k);
        }
    });
}

where get_random_hash is defined as:

fn get_random_hash<T>(
    new: &Fn(usize) -> T,
    insert: &Fn(&mut T, String, usize) -> (),
) -> (T, Vec<String>) {
    let mut keys = Vec::with_capacity(HASH_SIZE);
    let mut hash = new(HASH_CAPACITY);
    for i in 0..HASH_SIZE {
        let k: String = format!("{}", Uuid::new_v4());
        keys.push(k.clone());
        insert(&mut hash, k, i);
    }
    return (hash, keys);
}

and rust_insert_fn is:

fn rust_insert_fn(map: &mut HashMap<String, usize>, key: String, value: usize) {
    map.insert(key, value);
}

However, when I run the benchmark, it is clearly optimized out:

test benchmarks::benchmarks::rust_get        ... bench:           1 ns/iter (+/- 0)

I thought test::black_box would solve the problem but it doesn't look like it does. I have even tried wrapping thehash.get(&k) in the for loop withtest::black_box` but that still optimizes the code. How should I correctly get the code to run without being optimized out?

EDIT - Even the following does optimizes out the get operation:

#[bench]
fn rust_get(b: &mut Bencher) {
  let (hash, keys) = get_random_hash::<HashMap<String, usize>>(&HashMap::with_capacity, &rust_insert_fn);
  let mut keys = test::black_box(keys);
  b.iter(|| {
    let mut n = 0;
    for k in keys.drain(..) {
      hash.get(&k);
      n += 1;
    };
    return n;
  });
}

Interestingly, the following benchmarks work:

#[bench]
fn rust_get_random(b: &mut Bencher) {
  let (hash, _) = get_random_hash::<HashMap<String, usize>>(&HashMap::with_capacity, &rust_insert_fn);
  b.iter(|| {
    for _ in 0..HASH_SIZE {
      hash.get(&format!("{}", Uuid::new_v4()));
    }
  });
}

#[bench]
fn rust_insert(b: &mut Bencher) {
  b.iter(|| {
    let mut hash = HashMap::with_capacity(HASH_CAPACITY);
    for i in 0..HASH_SIZE {
      let k: String = format!("{}", Uuid::new_v4());
      hash.insert(k, i);
    }
  });
}

but this also does not:

#[bench]
fn rust_del(b: &mut Bencher) {
  let (mut hash, keys) = get_random_hash::<HashMap<String, usize>>(&HashMap::with_capacity, &rust_insert_fn);
  let mut keys = test::black_box(keys);
  b.iter(|| {
    for k in keys.drain(..) {
      hash.remove(&k);
    };
  });
}

Here is the full gist.

解决方案

How does a compiler optimizer work?

An optimizer is nothing more than a pipeline of analyses and transformations. Each individual analysis or transformation is relatively simple, and the optimal order to apply them is unknown and generally determined by heuristics.

How does this affect my benchmark?

Benchmarks are complicated in that in general you wish to measure optimized code, but at the same time some analyses or transformations may remove the code you were interested in rendering the benchmark useless.

It is therefore important to have a passing acquaintance with the analyses and transformation passes of the particular optimizer you are using so as to be able to understand:

  • which ones are undesirable,
  • how to foil them.

As mentioned, most passes are relatively simple, and therefore foiling them is relatively simple as well. The difficulty lies in the fact that there is a hundred or more of them and you have to know which one is kicking in to be able to foil it.

What optimizations am I running afoul of?

There are a few specific optimizations which very often play often with benchmarks:

What? How dare the optimizer mangle my code so?

The optimizer operates under the so-called as-if rule. This basic rule allows the optimizer to perform any transformation which does not change the output of the program. That is, it should not change the observable behavior of the program in general.

On top of that, a few changes are generally explicitly allowed. The most obvious being that the run-time is expected to shrink, this in turn means that thread interleaving may differ, and some languages give even more wiggle room.

I used black_box!

What is black_box? It's a function whose definition is specifically opaque to the optimizer. This has some implications on the optimizations the compiler is allowed to perform since it may have side-effects. This therefore mean:

  • the transformed code must perform the very same number of calls to black_box than the original code,
  • the transformed code must perform said calls in the same order with regard to the passed in arguments,
  • no assumption can be made on the value returned by black_box.

Thus, surgical use of black_box can foil certain optimizations. Blind use, however, may not foil the right ones.

What optimizations am I running afoul of?

Let's start from the naive code:

#[bench]
fn rust_get(b: &mut Bencher) {
    let (hash, mut keys): (HashMap<String, usize>, _) =
        get_random_hash(&HashMap::with_capacity, &rust_insert_fn);

    b.iter(|| {
        for k in keys.drain(..) {
            hash.get(&k);
        }
    });
}

The assumption is that the loop inside b.iter() will iterate over all keys and perform a hash.get() for each of them:

  1. The result of hash.get() is unused,
  2. hash.get() is a pure function, meaning that is has no side-effect.

Thus, this loop can be rewritten as:

b.iter(|| { for k in keys.drain(..) {} })

We are running afoul of Dead Code Elimination (or some variant): the code serves no purpose, thus it is eliminated.

It may even be that the compiler is smart enough to realize that for k in keys.drain(..) {} can be optimized into drop(keys).

A surgical application of black_box can, however, foil DCE:

b.iter(|| {
    for k in keys.drain(..) {
        black_box(hash.get(&k));
    }
});

As per the effects of black_box described above:

  • the loop can no longer be optimized out, as it would change the number of calls to black_box,
  • each call to black_box must be performed with the expected argument.

There is still one possible hurdle: Constant Propagation. Specifically if the compiler realizes that all keys yield the same value, it could optimize out hash.get(&k) and replace it by said value.

This can be achieved by obfuscating the keys: let mut keys = black_box(keys);, as you did above, or the map. If you were to benchmark an empty map, the latter would be necessary, here they are equal.

We thus get:

#[bench]
fn rust_get(b: &mut Bencher) {
    let (hash, keys): (HashMap<String, usize>, _) =
        get_random_hash(&HashMap::with_capacity, &rust_insert_fn);
    let mut keys = test::black_box(keys);

    b.iter(|| {
        for k in keys.drain(..) {
            test::black_box(hash.get(&k));
        }
    });
}

A final tip.

Benchmarks are complicated enough that you should be extra careful to only benchmark what you wish to benchmark.

In this particular case, there are two method calls:

  • keys.drain(),
  • hash.get().

Since the benchmark name suggests, to me, that what you aim for is to measure the performance of get, I can only assume that the call to keys.drain(..) is a mistake.

Thus, the benchmark really should be:

#[bench]
fn rust_get(b: &mut Bencher) {
    let (hash, keys): (HashMap<String, usize>, _) =
        get_random_hash(&HashMap::with_capacity, &rust_insert_fn);
    let keys = test::black_box(keys);

    b.iter(|| {
        for k in &keys {
            test::black_box(hash.get(k));
        }
    });
}

In this instance, this is even more critical in that the closure passed to b.iter() is expected to run multiple times: if you drain the keys the first time, what's left afterward? An empty Vec...

... which may actually be all that is really happening here; since b.iter() runs the closure until its time stabilizes, it may just be draining the Vec in the first run and then time an empty loop.

这篇关于锈的基准优化出来的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆