Rust的数组边界检查是否会影响性能? [英] Does Rust's array bounds checking affect performance?

查看:318
本文介绍了Rust的数组边界检查是否会影响性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我来自C,我想知道Rust的边界检查是否会影响性能.每次访问可能都需要一些附加的汇编指令,这可能会在处理大量数据时造成伤害.

I'm coming from C and I wonder whether Rust's bounds checking affects performance. It probably needs some additional assembly instructions for every access, which could hurt when processing lots of data.

另一方面,处理器性能上最昂贵的东西是内存,因此更多的算术汇编程序指令可能不会受到损害,但是在加载缓存行之后,顺序访问应该非常快.

On the other hand, the costly thing in processor performance is memory, so more arithmetic assembler instructions might not hurt, but then it might matter that after a cache line is loaded, sequential access should be very fast.

有人对此进行了基准测试吗?

Has somebody benchmarked this?

推荐答案

不幸的是,边界检查的成本并不是一件容易估算的事情.这肯定不是每个检查一个周期",也不是任何这样容易猜到的费用.它产生非零影响,但可能无关紧要.

Unfortunately, the cost of a bounds check is not a straightforward thing to estimate. It's certainly not "one cycle per check", or any such easy to guess cost. It will have nonzero impact, but it might be insignificant.

理论上,通过修改Rust以禁用它们并运行大规模的生态系统测试,可以测量诸如Vec之类的基本类型的边界检查成本.这将提供某种经验法则,但是如果不这样做,很难知道这将接近开销的百分之十还是十分之一.

In theory, it would be possible to measure the cost of bounds checking on basic types like Vec by modifying Rust to disable them and running a large-scale ecosystem test. This would give some kind of rule of thumb, but without doing it, it's quite hard to know whether this will be closer to a ten percent or a tenth of a percent overhead.

不过,有一些方法可以比计时和猜测更好.这些经验法则主要适用于台式机级硬件.低端硬件或针对不同细分市场的产品将具有不同的特征.

There are some ways you can do better than timing and guessing, though. These rules of thumb apply mostly to desktop-class hardware; lower end hardware or something that targets a different niche will have different characteristics.

如果索引是根据容器大小得出的,很有可能编译器可能完全消除边界检查.在这一点上,发布版本中进行边界检查的唯一代价是它会间歇性地干扰优化,这可以(但通常不会)阻碍其他优化.

If your indices are derived from the container size, there is a good chance that the compiler might be able to eliminate the bounds checks entirely. At this point the only cost of the bounds checks in a release build is that it intermittently interferes with optimizations, which could, but normally doesn't, impede other optimizations.

如果您的代码是分支代码,内存访问过多或难以优化,并且易于访问边界检查,则很有可能边界检查将大部分发生在CPU的备用带宽,尤其是分支预测可以帮助解决这种情况,在这种情况下,总成本将特别小,尤其是与其余代码的成本相比.

If your code is branchy, memory access heavy or otherwise hard to optimise, and the bounds to check are easy to access, there is a good chance that bounds checking will manage to happen mostly in the CPU's spare bandwidth, with branch prediction helping out specifically, in which case the overall cost will be particularly small, especially compared to the cost of the rest of the code.

如果要检查的范围位于几层指针后面,可能会遇到内存延迟问题,并且相应地受到影响,这是有道理的.但是,CPU中的推测和预测机制将设法掩盖这一点也是合理的.这是非常依赖于上下文的.如果您要引用内部数据而不是在进行边界检查时同时取消引用,则这种风险会增大.

If your bounds to check are behind several layers of pointers, it is plausible that you will hit issues with memory latency, and will suffer correspondingly. However, it is also plausible that speculation and prediction machinery in the CPU will manage to hide this; this is very context-dependent. If you are taking references to the data inside, rather than dereferencing it at the same time as the bounds check, this risk magnifies.

如果边界检查处于无法使内核饱和的紧密算术循环中,则除非阻止其他编译器优化,否则不太可能直接损害吞吐量.但是,阻碍其他编译器优化可能会很糟糕,从无差别到防止SIMD并导致10倍速度减慢都无济于事.

If your bounds checks are in a tight arithmetic loop that doesn't saturate the core, you aren't likely to hurt throughput directly except by impeding other compiler optimisations. However, impeding other compiler optimisations can be arbitrarily bad, anywhere from no difference to preventing SIMD and causing a factor-10 slowdown.

如果边界检查处于紧密的算术循环中,确实使核饱和,,则您承担上述风险 每个边界检查大约要执行半个周期的执行惩罚.

If your bounds checks are in a tight arithmetic loop that does saturate the core, you take on the above risk and have a direct execution penalty of roughly half a cycle per bounds check.

如果您的代码足够大,可以加重指令缓存,,则您需要担心对代码大小的影响.这通常是适度的,但是特别难以衡量其对运行时间的影响.

If your code is large enough to stress the instruction cache, then you need to worry about the impact on code size. This is normally modest, but is particularly hard to measure the runtime impact of.

彼得·科德斯(Peter Cordes)在评论中补充了一些观点.首先,边界检查隐含了负载和存储,因此您将要运行混合负载,这很可能会在发布/重命名方面造成瓶颈.其次,即使并行执行的预测分支也从预测器中获取资源,这可能导致其他分支预测更糟.

Peter Cordes adds some further points in comments. First, bounds checks imply loads and stores, so you're going to be running a mixed load which is most likely to bottleneck on issue/rename. Second, even predicted branches executed in parallel take resources from the predictor, which can cause other branches to predict worse.

这似乎令人生畏,确实如此.这就是为什么在与您和您的代码相关的水平上衡量和理解您的性能很重要的原因.

This might seem intimidating, and it is. That is why it's important to measure and understand your performance at the level that is relevant for you and your code.

也是这样的情况,因为Rust诞生于边界检查中,它已经产生了降低成本的手段,例如普遍的零成本引用,迭代器(吸收但实际上并未删除边界检查) ),以及一组不寻常的实用工具功能.如果您发现自己遇到了病理性病例,Rust还会提供不安全的逃生舱口.

It is also the case that since Rust was "born" with bounds checking, it has produced means to reduce their cost, such as pervasive zero-cost references, iterators (which absorb, but don't actually remove, bounds checks), and an unusual set of nice utility functions. If you find yourself hitting a pathological case, Rust also offers unsafe escape hatches.

这篇关于Rust的数组边界检查是否会影响性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆