在组装中做某事与让组装者做这件事 [英] Doing something in assembly vs having the assembler do it

查看:34
本文介绍了在组装中做某事与让组装者做这件事的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

获得2^n - 1的两种方法中哪一种是首选?为什么一个比另一个更受欢迎?

# (2a) -- 说明移动 $1, %eaxshl $X, %eax十月%eax# (2b) -- 汇编程序mov $((1 << X) - 1), %eax

我个人认为第一个更具可读性,但我很确定可读性不是 asm 的重点.

解决方案

总是在汇编时(每次构建一次)尽可能多地做,而不是在运行时这样做,因为它会消耗代码大小,并且每次都花费时间此块执行.

它们都以 mov $imm32, %eax 形式的 mov 开头,但是第一个版本浪费了 2 个额外的指令,所以它是零优势的垃圾,对于习惯于考虑效率的任何人来说,它看起来非常丑陋和疯狂.

没有编译器可以将您的代码优化为 CPU 可以高效运行的代码,这取决于您来实现.如果你不关心性能,基本上没有理由一开始就搞乱 asm1,要么手写,要么考虑编译器输出.

您甚至不得不问这个问题的事实表明,您要么错过了汇编语言的重点(通常是性能),要么错误地将 asm 视为与 C++ 之类的编译语言相同的方式.您需要调整您的心智模型,以考虑您正在创建的机器代码、CPU 将执行的机器代码以及如何使其尽可能高效.

你需要像编译器一样思考;对于前端,我怎样才能在尽可能少的 uop 中做到这一点?"(https://agner.org/optimize),以字节为单位的最小代码大小作为决胜局.或者根据您的目标,可能会优化代码大小而不是速度.但无论如何,编译器会积极评估表达式并尽可能多地进行常量传播,以将源代码中的常量组合到编译时工作而不是运行时.

脚注 1:在这种情况下,请使用具有良好优化编译器的语言编写,例如C 或 Rust,让它为你创建机器代码.(尽管公平地说,如果您对两者都同样熟悉,那么在 asm 中有一些事情比 C 更容易,例如扩展精度数学.很少有高级语言可以轻松使用其他操作的进位输出.)


可读性:

您 100% 正确,可读性通常不是 asm 的首要任务;在任何情况下,首先要手动编写 asm,它总是在代码大小和/或性能方面处于次要地位.但在这些限制范围内,我们当然可以尽可能提高可读性.

对于有经验的 asm 用户阅读您的代码而言,您的运行时计算方式非常令人惊讶,而且完全不习惯. 如果我在其他方面正常的代码中遇到这种情况,我需要一些时间才能翻倍- 检查并确保我正确理解它(例如,可能毕竟有一些非常量的输入,或者这可能以某种方式设置 FLAGS 稍后也需要).

在运行时工作的唯一原因是它不能在编译时完成(因为它不是恒定的)所以看到输入来自的转变会非常令人惊讶.如果我看到在生产代码中创建 32 位常量的 3 条指令序列(不是关于 Stack Overflow 的初学者问题),我会惊讶于编写它的人的无能,在发现它只是创建了一个 32-位常数.

除此之外,运行时版本还有 2 条指令可供阅读,如果这作为较大代码块的一部分出现.asm 中的代码密度(就每个源代码行完成的数量而言)已经很低,因此最小化指令数通常有利于函数的整体可读性.

(通常也有利于提高效率,除了像 div 这样的慢指令替换为 一个乘法逆+移位.但这对于可读性来说已经够糟糕了,对于手写来说并不太奇怪asm 将立即数移动到寄存器,然后 div 通过它,如果性能不是该函数或代码块的top 优先级,例如因为它不t 经常运行.除非除数是 2 的幂,否则它只是右移的一个非常愚蠢的不太方便的替代方法.)


(1<<n) - 1 是大多数有经验的 asm 程序员都熟悉的非常常见的习语. 另见 https://catonmat.net/low-level-bit-hacks(很多人也会熟悉二进制技巧,例如这来自其他语言的低级经验,这绝对不是 asm 独有的.)

所以对于这种情况,我真的会说只是习惯于看到像 和 $(1<<X) - 1, %eax 这样的东西.或者 和 $-16, %eax 作为编写将低 4 位清零的 AND 掩码的便捷方法,将 EAX 向下舍入为 16 的倍数.(利用 2 的补码).

但是,您可以通过定义一个汇编时间常量(例如 XMASK = (1<<X) - 1)来避免在使用它的任何地方重复该表达式.

或者你可以做类似的事情

#define SHIFT2MASK(x_) ((1<

并使用 gcc -c foo.S 编译以通过 C 预处理器运行您的 asm 源.

(GAS 原生宏像指令一样工作,而不是针对其他指令的单个操作数,因此像 C 预处理器这样的宏语言更方便.)

这种方法的难点在于选择一个清晰宏名称,它清楚地传达了这样一个事实:它将移位计数转换为带有设置位向上的掩码到那个位置.不是 0xffffffff0 或其他东西,也不仅仅是 1<<4.为了测试位图,您将执行诸如 test $1<<3, %al 之类的操作,并且掩码可以很容易地描述在适当位置设置 1 位的值.>

需要明确的是,SHIFT2MASK 并非完全明确地命名.除了从它的使用方式之外,希望如此.理想情况下,它可以是不言自明的,注释可以是更高级别的,描述算法,而不是读者在代码中已经可以看到的细节.

Which of the following two methods is preferred to get 2^n - 1? Why is one preferred over the other?

# (2a) -- instructions
mov $1, %eax
shl $X, %eax
dec %eax

# (2b) -- assembler
mov $((1 << X) - 1), %eax

I find the first more readable personally but I'm pretty sure readability isn't the point of asm.

解决方案

Always do as much as possible at assemble-time (once per build), not at runtime where it costs code size, and costs time every time this block executes.

They both start with the same mov $imm32, %eax form of mov, but then the first version wastes 2 extra instructions so it's total garbage with zero advantages, and looks super ugly and insane to anyone used to thinking about efficiency.

There is no compiler to optimize your code into something the CPU can run efficiently, it's up to you to make that happen. If you don't care about performance, there's basically no reason to be messing around with asm in the first place1, either writing it by hand or thinking about compiler output.

The fact that you even have to ask this is a sign you're either missing the point of assembly language (usually performance), or you're mistakenly thinking of asm the same way as you would a compiled language like C++. You need to adjust your mental model to think about the machine code you're creating, that the CPU will execute, and how to make that as efficient as possible.

You need to think like a compiler; "how can I do this in as few uops as possible for the front-end?" (https://agner.org/optimize), with minimum code-size in bytes as a tie-breaker. Or depending on your goals, maybe optimizing for code-size over speed. But anyway, compilers aggressively evaluate expressions and do constant-propagation as much as possible to combine constants in the source code into compile-time work instead of run-time.

Footnote 1: In that case, write in a language that has a nice optimizing compiler, e.g. C or Rust, and let it create machine code for you. (Although to be fair, a few things are easier in asm than C if you know both equally well, such as extended precision math. Very few high-level languages make it easy to use the carry output from other operations.)


Readability:

You are 100% correct that readability is usually not the top priority in asm; it always takes a back seat to code-size and/or performance in any case where it's worth writing asm by hand in the first place. But within those constraints, we can certainly aim for as much readability as possible.

Your runtime computation way is extremely surprising to experienced asm users reading your code, and not idiomatic at all. If I came across that in otherwise-sane code, it would take me some time to double-check and make sure I was understanding it properly (e.g. maybe there's some non-constant input to this after all, or maybe this sets FLAGS a certain way that's also needed later).

The only reason to do work at run-time is when it couldn't have been done at compile time (because it's not constant) so it would be very surprising to see a shift whose input came from. If I saw that sequence of 3 instructions to create a 32-bit constant in production code (not beginner questions on Stack Overflow), I'd be shocked at the incompetence of whoever wrote it, after figuring out it was just creating a 32-bit constant.

Apart from that, the runtime version is 2 more instructions to read, if this appears as part of a larger block of code. Code density (in terms of amount done per source line) is already low in asm, so minimizing instruction count is generally good for overall readability of a function.

(As well as usually being good for efficiency, except for cases like replacing a slow instruction like div with a multiplicative inverse + shift. But that's bad enough for readability that it's not too weird for hand-written asm to mov an immediate to a register and then div by it, if performance wasn't the top priority of that one function or block of code, e.g. because it doesn't run often. Unless the divisor is a power of 2, then it's just a really stupid less convenient alternative to a right shift.)


(1<<n) - 1 is a pretty common idiom that most experienced asm programmers are familiar with. See also https://catonmat.net/low-level-bit-hacks (Many people will also be familiar with binary tricks like this from low-level experience in other languages, it's definitely not unique to asm.)

So for this case specifically, I'd really say just get used to seeing stuff like and $(1<<X) - 1, %eax. Or and $-16, %eax as a convenient way to write an AND mask that zeros the low 4 bits, rounding EAX down to a multiple of 16. (Taking advantage of 2's complement).

Macros

However, you can avoid repeating that expression everywhere you use it by defining an assemble-time constant like XMASK = (1<<X) - 1 that you can use instead.

Or you can do something like

#define SHIFT2MASK(x_)  ((1<<x_)-1)

...
X=3

mov   $SHIFT2MASK(X), %eax

and   $SHIFT2MASK(4), %ecx

and compile with gcc -c foo.S to run your asm source through the C preprocessor.

(GAS native macros work like instructions, not for single operands to other instructions, so a macro language like the C preprocessor is more convenient for this.)

The hard part with this approach is choosing a clear macro name that unambiguously conveys the fact that it turns a shift count into a mask with set bits up to that position. Not 0xfffffff0 or something, and not just 1<<4 either. For testing a bitmap, you would be doing stuff like test $1<<3, %al, and a mask could just as easily describe the value with 1 bit set at the appropriate position.

To be clear, SHIFT2MASK is not fully unambiguously named. Other than from context of how it's getting used, hopefully. Ideally it can be self-explanatory enough that the comments can be higher-level, describing the algorithm, not the nuts and bolts that the reader can already see in the code itself.

这篇关于在组装中做某事与让组装者做这件事的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆