为什么 ARM 使用两条指令来屏蔽一个值? [英] Why does ARM use two instructions to mask a value?

查看:18
本文介绍了为什么 ARM 使用两条指令来屏蔽一个值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于以下功能...

uint16_t swap(const uint16_t value)
{
    return value << 8 | value >> 8;
}

...为什么带有 -O2 的 ARM gcc 6.3.0 会产生以下程序集?

...why does ARM gcc 6.3.0 with -O2 yield the following assembly?

swap(unsigned short):
  lsr r3, r0, #8
  orr r0, r3, r0, lsl #8
  lsl r0, r0, #16         # shift left
  lsr r0, r0, #16         # shift right
  bx lr

看起来编译器正在使用两个移位来屏蔽不需要的字节,而不是使用逻辑 AND.编译器可以改为使用 和 r0, r0, #4294901760 吗?

It appears the compiler is using two shifts to mask off the unwanted bytes, instead of using a logical AND. Could the compiler instead use and r0, r0, #4294901760?

推荐答案

较旧的 ARM 程序集无法轻松创建常量.相反,它们被加载到文字池中,然后通过内存加载读入.您建议的这个 只能采用我相信带有移位的 8 位文字.您的 0xFFFF0000 需要 16 位作为 1 条指令.

Older ARM assembly cannot create constants easily. Instead, they are loaded into literal pools and then read in via a memory load. This and you suggest can only take I believe an 8-bit literal with shift. Your 0xFFFF0000 requires 16-bits to do as 1 instructions.

所以,我们可以从内存中加载并执行(慢),取 2 条指令来创建值,1 条到和(更长),或者只是便宜地转移两次并称其为好.

So, we can load from memory and do an and (slow), Take 2 instructions to create the value and 1 to and (longer), or just shift twice cheaply and call it good.

编译器选择了这些转变,老实说,它很快.

The compiler chose the shifts and honestly, it is plenty fast.

现在进行现实检查:

担心一个班次,除非这是一个 100% 的瓶颈,这肯定是在浪费时间.即使编译器是次优的,您也几乎不会感觉到它.担心代码中的热"循环,而不是像这样的微操作.从好奇心看这个真是太棒了.担心这个确切的代码在您的应用中的性能,而不是太多.

Worrying about a single shift, unless this is a 100% for sure bottleneck is a waste of time. Even if the compiler was sub-optimal, you will almost never feel it. Worry about "hot" loops in code instead for micro-ops like this. Looking at this from curiosity is awesome. Worrying about this exact code for performance in your app, not so much.

这里的其他人已经注意到,较新版本的 ARM 规范允许更有效地完成此类事情.这表明,在此级别讨论时,指定芯片或至少是我们正在处理的确切 ARM 规范非常重要.我假设古老的 ARM 缺乏从您的输出中给出的更新"指令.如果我们正在跟踪编译器错误,那么这个假设可能不成立,了解规范更为重要.对于这样的交换,在以后的版本中确实有更简单的说明来处理这个问题.

It has been noted by others here that newer versions of the ARM specifications allow this sort of thing to be done more efficiently. This shows that it is important, when talking at this level, to specify the Chip or at least the exact ARM spec we are dealing with. I was assuming ancient ARM from the lack of "newer" instructions given from your output. If we are tracking compiler bugs, then this assumption may not hold and knowing the specification is even more important. For a swap like this, there are indeed simpler instructions to handle this in later versions.

编辑 2

为了加快速度,可以做的一件事是使其内联.在这种情况下,编译器可以将这些操作与其他工作交织在一起.由于许多 ARM CPU 有 2 个整数指令流水线,因此根据 CPU 的不同,这里的吞吐量可能会增加一倍.充分展开说明,以便没有危险,然后它就消失了.这必须与 I-Cache 的使用进行权衡,但在重要的情况下,您可以看到更好的结果.

One thing that could be done to possibly make this faster is to make it inline'd. In that case, the compiler could interleave these operations with other work. Depending on the CPU, this could double the throughput here as many ARM CPUs have 2 integer instruction pipelines. Spread out the instructions enough so that there are no hazards, and away it goes. This has to be weighed against I-Cache usage, but in a case where it mattered, you could see something better.

这篇关于为什么 ARM 使用两条指令来屏蔽一个值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆