CS:APP 示例使用带有两个操作数的 idivq? [英] CS:APP example uses idivq with two operands?

查看:18
本文介绍了CS:APP 示例使用带有两个操作数的 idivq?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过计算机系统从程序员的角度"(第 3 版)一书阅读有关 x86-64(以及一般的汇编)的信息.根据网络上的其他来源,作者指出 idivq 只接受一个操作数 - 就像

解决方案

那是个错误. 只有 imul 有立即和 2 寄存器形式.

mul、div 或 idiv 仍然仅以 8086 引入的单操作数形式存在,使用 RDX:RAX 作为隐式双宽操作数用于输出(和输入用于除法).

或 EDX:EAX、DX:AX 或 AH:AL,当然取决于操作数大小.查阅 ISA 参考资料,如英特尔手册,而不是本书!https://www.felixcloutier.com/x86/idiv

另见当以及为什么我们签署扩展并使用带有 mul/div 的 cdq?为什么在使用DIV指令之前EDX要为0?

x86-64 唯一的硬件除法指令是 idivdiv.64 位模式删除了 aam,它用立即数进行 8 位除法.(在汇编器 x86 中划分Displaying Time in Assembly 有一个在 16 位模式下使用 aam 的例子).

当然对于除以常量idivdiv(和aam)来说效率很低.除非您针对代码大小而不是性能进行优化,否则请对 2 的幂使用移位,否则使用乘法逆运算.

<小时>

CS:APP 3e 全球版在实践问题中显然有多个像这样的严重 x86-64 指令集错误,声称 GCC 发出不可能的指令.不仅仅是拼写错误或细微的错误,还有误导性的废话,这对于熟悉 x86-64 指令集的人来说显然是错误的.这不仅仅是一个语法错误,它试图使用不可编码的指令(没有语法可以表达它们,除了扩展为多个指令的宏.将 idivq 定义为伪指令使用宏会很奇怪).

例如我正确地猜到了函数的缺失部分,但 gcc 生成的汇编代码与答案不匹配 是另一个提示 (%rbx, %rdi, %rsi)(%rsi, %rsi, 9) 是有效的寻址模式!比例因子实际上是 2 位移位计数,因此这些完全是垃圾,表明作者严重缺乏他们正在教授的 ISA 知识,而不是打字错误.

他们的代码不会使用任何 AT&T 语法汇编器进行汇编.

还有 这个只有一个操作数的 x86-64 addq 指令是什么意思?(来自 CSAPP book 3rd Edition) 是另一个例子,其中他们有一个无意义的 addq %eax 而不是 inc %rdx,以及一个不匹配的操作数大小一个 mov 商店.

<小时>

似乎他们只是在编造东西并声称它是由 GCC 发出的.IDK 如果他们从真正的 GCC 输出开始并将其编辑为他们认为更好的示例,或者实际上从头开始手写而不进行测试.

GCC 的实际输出会使用魔法常数(定点乘法逆)乘以除以 9(即使在 -O0,但这显然不是调试模式代码.他们可以使用过 -Os).

大概他们不想谈论 为什么 GCC 在实现整数除法时使用乘以一个奇怪的数? 并用他们编写的指令替换了该代码块.从上下文中,您可能会弄清楚他们期望输出的去向;也许他们的意思是rcx/= 9.

<小时>

这些错误来自全球版的第3方练习题

来自出版商的网站 (https://csapp.cs.cmu.edu/3e/勘误表.html)

<块引用>

全球版注意事项:不幸的是,出版商安排在全球版中生成一套不同的练习和家庭作业问题.做这件事的人做得不是很好,所以这些问题和他们的解决方案有很多错误.我们尚未为此版本创建勘误表.

所以CS:APP 3e大概是一本不错的教材,只要拿到北美版,或者忽略练习/作业问题.这解释了教科书的声誉和广泛使用与严重和明显(对熟悉 x86-64 asm 的人而言)这样的错误之间的巨大脱节,这些错误超出了对语言不了解的范围.<小时>

如何设计假设的 idiv reg, regidiv $imm, reg

<块引用>

此外,应该从寄存器 %rdx(高位 64 位)和 %rax(低位 64 位)中的数量给出红利——因此,如果在架构中定义了这一点,那么似乎不可能第二个操作数可以是指定的被除数.

如果 Intel 或 AMD dividiv 引入了一种新的方便形式,他们会设计它以使用单宽度红利,因为编译器总是这样使用它.

大多数语言都像 C 语言一样,将 + - */的两个操作数隐式提升为相同的类型,并产生该宽度的结果.当然,如果已知输入很窄,则可以对其进行优化.(例如,使用一个 imul r32 来实现 a * (int64_t)b).

但是如果商溢出,dividiv 会出错,所以在编译 时使用单个 32 位 idiv 是不安全的int32_t q = (int64_t)a/(int32_t)b.

编译器总是在DIV之前使用xor edx,edx或者在IDIV之前使用cdqcqo来实际做n/n => n 位除法.

使用不只是零或符号扩展的红利的真正全角除法只能通过内部函数或 asm 手动完成(因为 gcc/clang 和其他编译器不知道优化何时是安全的),或在 gcc 辅助函数中执行例如32 位代码中的 64 位/64 位除法.(或 64 位代码中的 128 位除法).

那么最有用的是 div/idiv ,它也避免了设置 RDX 的额外指令,以及最小化隐式寄存器操作数的数量.(例如 imul r32, r/m32imul r32, r/m32, imm 做:在没有隐式寄存器的情况下,使非扩展乘法的常见情况更方便.这是英特尔的语法,如手册,目的地优先)

最简单的方法是执行 dst/= src 的 2 操作数指令.或者可能用商和余数替换两个操作数.对 BMI1 andn 等 3 个操作数使用 VEX 编码,您也许可以有
idivx 余数_dst,被除数,除数.第二个操作数也是商的输出.或者,您可以将余数写入 RDX,并为商指定一个非破坏性目标.

或者更可能针对只需要商的简单情况进行优化,idivx quot,divided,divisor 并且不将余数存储在任何地方.当您需要商时,您始终可以使用常规 idiv.

BMI2 mulx 使用隐式 rdx输入操作数,因为它的目的是允许多个带进位加法链用于扩展精度乘法.所以它仍然必须产生 2 个输出.但是这种假设的 idiv 新形式的存在是为了节省代码大小和围绕 idiv 的正常使用而 不会 扩大的 uops.所以386 imul reg, reg/mem 是比较点,而不是BMI2 mulx.

IDK 如果引入 idivx 的直接形式也有意义;您只会出于代码大小的原因使用它.乘法逆运算更有效地除以常数,因此此类指令在现实世界中的用例很少.

I am reading about x86-64 (and assembly in general) through the book "computer systems a programmer's perspective"(3rd edition). The author, in compliance with other sources from the web, states that idivq takes one operand only - just as this one claims. But then, the author, some chapters later, gives an example with the instruction idivq $9, %rcx.

Two operands? I first thought this was a mistake but it happens a lot in the book from there.

Also, the dividend should be given from the quantity in registers %rdx (high-order 64 bits) and %rax (low-order 64 bits) - so if this is defined in the architecture then it does not seem possible that the second operand could be a specified dividend.


Here is an example of an exercise (too lazy to write it all down - so a picture is the way to go). It claims that GCC emits idivq $9, %rcx when compiling a short C function.

解决方案

That's a mistake. Only imul has immediate and 2-register forms.

mul, div, or idiv still only exist in the one-operand form introduced with 8086, using RDX:RAX as the implicit double-width operand for output (and input for division).

Or EDX:EAX, DX:AX, or AH:AL, depending on operand-size of course. Consult an ISA reference like Intel's manual, not this book! https://www.felixcloutier.com/x86/idiv

Also see When and why do we sign extend and use cdq with mul/div? and Why should EDX be 0 before using the DIV instruction?

x86-64's only hardware division instructions are idiv and div. 64-bit mode removed aam, which does 8-bit division by an immediate. (Dividing in Assembler x86 and Displaying Time in Assembly has an example of using aam in 16-bit mode).

Of course for division by constants idiv and div (and aam) are very inefficient. Use shifts for powers of 2, or a multiplicative inverse otherwise, unless you're optimizing for code-size instead of performance.


CS:APP 3e Global Edition apparently has multiple serious x86-64 instruction-set mistakes like this in practice problems, claiming that GCC emits impossible instructions. Not just typos or subtle mistakes, but misleading nonsense that's very obviously wrong to people familiar with the x86-64 instruction set. It's not just a syntax mistake, it's trying to use instructions that aren't encodeable (no syntax can exist to express them, other than a macro that expands to multiple instructions. Defining idivq as a pseudo-instruction using a macro would be pretty weird).

e.g. I correctly guessed missing part of a function, but gcc generated assembly code doesn't match the answer is another one where it suggests that (%rbx, %rdi, %rsi) and (%rsi, %rsi, 9) are valid addressing modes! The scale factor is actually a 2-bit shift count so these are total garbage and a sign of a serious lack of knowledge by the authors about the ISA they're teaching, not a typo.

Their code won't assemble with any AT&T syntax assembler.

Also What does this x86-64 addq instruction mean, which only have one operand? (From CSAPP book 3rd Edition) is another example, where they have a nonsensical addq %eax instead of inc %rdx, and a mismatched operand-size in a mov store.


It seems that they're just making stuff up and claiming it was emitted by GCC. IDK if they start with real GCC output and edit it into what they think is a better example, or actually write it by hand from scratch without testing it.

GCC's actual output would have used multiplication by a magic constant (fixed-point multiplicative inverse) to divide by 9 (even at -O0, but this is clearly not debug-mode code. They could have used -Os).

Presumably they didn't want to talk about Why does GCC use multiplication by a strange number in implementing integer division? and replaced that block of code with their made-up instruction. From context you can probably figure out where they expect the output to go; perhaps they mean rcx /= 9.


These errors are from 3rd-party practice problems in the Global Edition

From the publisher's web site (https://csapp.cs.cmu.edu/3e/errata.html)

Note on the Global Edition: Unfortunately, the publisher arranged for the generation of a different set of practice and homework problems in the global edition. The person doing this didn't do a very good job, and so these problems and their solutions have many errors. We have not created an errata for this edition.

So CS:APP 3e is probably a good textbook, as long as you get the North American edition, or ignore the practice / homework problems. This explains the huge disconnect between the textbook's reputation and wide use vs. the serious and obvious (to people familiar with x86-64 asm) errors like this one that go beyond sloppy into don't-know-the-language territory.


How a hypothetical idiv reg, reg or idiv $imm, reg would be designed

Also, the dividend should be given from the quantity in registers %rdx (high-order 64 bits) and %rax (low-order 64 bits) - so if this is defined in the architecture then it does not seem possible that the second operand could be a specified dividend.

If Intel or AMD had introduced a new convenient forms for div or idiv, they would have designed it to use a single-width dividend because that's how compilers always use it.

Most languages are like C and implicitly promote both operands for + - * / to the same type and produce a result of that width. Of course if the inputs are known to be narrow that can be optimized away. (e.g. using one imul r32 to implement a * (int64_t)b).

But div and idiv fault if the quotient overflows so it's not safe to use a single 32-bit idiv when compiling int32_t q = (int64_t)a / (int32_t)b.

Compilers always use xor edx,edx before DIV or cdq or cqo before IDIV to actually do n / n => n-bit division.

Real full-width division using a dividend that isn't just zero- or sign-extended is only done by hand with intrinsics or asm (because gcc/clang and other compilers don't know when the optimization is safe), or in gcc helper functions that do e.g. 64-bit / 64-bit division in 32-bit code. (Or 128-bit division in 64-bit code).

So what would be most helpful is a div/idiv that avoids the extra instruction to set up RDX, too, as well as minimizing the number of implicit register operands. (Like imul r32, r/m32 and imul r32, r/m32, imm do: making the common case of non-widening multiplication more convenient with no implicit registers. That's Intel-syntax like the manuals, destination first)

The simplest way would be a 2-operand instruction that did dst /= src. Or maybe replaced both operands with quotient and remainder. Using a VEX encoding for 3 operands like BMI1 andn, you could maybe have
idivx remainder_dst, dividend, divisor. With the 2nd operand also an output for the quotient. Or you could have the remainder written to RDX with a non-destructive destination for the quotient.

Or more likely to optimize for the simple case where only the quotient is needed, idivx quot, dividend, divisor and not store the remainder anywhere. You can always use regular idiv when you want the quotient.

BMI2 mulx uses an implicit rdx input operand because its purpose is to allow multiple dep chains of add-with-carry for extended-precision multiply. So it still has to produce 2 outputs. But this hypothetical new form of idiv would exist to save code-size and uops around normal uses of idiv that aren't widening. So 386 imul reg, reg/mem is the point of comparison, not BMI2 mulx.

IDK if it would make sense to introduce an immediate form of idivx as well; you'd only use it for code-size reasons. Multiplicative inverses are more efficient division by constants so there's very little real-world use-case for such an instruction.

这篇关于CS:APP 示例使用带有两个操作数的 idivq?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆