x86 LOOP 指令究竟是如何工作的? [英] How exactly does the x86 LOOP instruction work?

查看:39
本文介绍了x86 LOOP 指令究竟是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 mov ecx, 16环顶:...循环回路

这个循环会执行多少次?

如果开始时 ecx = 0 会发生什么?在这种情况下,loop 是跳跃还是失败?

解决方案

loopdec ecx/jnz 完全一样,只是它不设置标志.

这就像 C 中 do {} while(--ecx != 0); 的底部.如果执行进入带有 ecx = 0 的循环,则换行-around 表示循环将运行 2^32 次.(或者在 64 位模式下 2^64 次,因为它使用 RCX.)

不同于 rep movsb/stosb/etc.,它在递减之前不检查ECX=0,只有在1之后.

地址大小决定了它是使用 CX、ECX 还是 RCX.所以在 64 位代码中,addr32 loop 就像 dec ecx/jnz,而常规的 loop 就像 dec rcx/jnz.或者在 16 位代码中,它通常使用 CX,但地址大小前缀 (0x67) 将使其使用 ecx.正如英特尔手册所说,它忽略了 REX.W,因为它设置了操作数大小,而不是地址大小.

rep 字符串指令以相同的方式使用地址大小前缀,覆盖地址大小以及 RCX 与 ECX(或 64 位以外模式中的 CX 与 ECX).字符串指令的操作数大小已用于确定 movswmovsdmovsq,并且您希望地址/重复大小为与之正交.让 loopjrcxz/jecxz 遵循该行为只是延续了 loop 的 8086 的设计意图.当简单的 rep 无法完成工作时,与字符串操作一起使用;见下文.

相关:为什么循环总是编译成do...while";style (tail jump)? 更多关于 asm 中的循环结构,while() {} vs. do {} while() 以及如何放置它们出来.


脚注 1:jcxz(或 x86-64 jrcxz)用于在 do {} 的顶部之前使用while 风格的循环,如果它应该运行 0 次就跳过它.在现代 CPU 上 test rcx, rcx/jz 效率更高.

Stephen Morse,8086 的架构师,在他的书 The 8086 Primer 的那部分写了关于 loop/jcxz 的预期用途和字符串指令,可免费获得在他的网站上:https://www.stevemorse.org/8086/index.html.请参阅复杂字符串说明"小节,从第 71 页底部开始.(或者从本章前面开始阅读,整个字符串指令部分从第 66 页开始.但请注意 @ecm 对一些事情的评论 这本书似乎解释得不好或不正确.)

如果您想知道 x86 指令的设计意图,那么您找不到比这更好的来源了.这与使用它们的最佳/最有效方式不同,尤其是在现代 x86 上,但对于初学者来说,这是一个很好的介绍,可以让您了解如何使用 asm 指令作为构建块.


额外的调试技巧

如果您想了解指令的详细信息,请查看手册:英特尔官方第 2 卷 PDF 指令集参考手册,或每个条目位于不同页面的 html 摘录(http://felixcloutier.com/x86/).但请注意,HTML 省略了介绍和附录,其中包含有关如何解释内容的详细信息,例如当它说根据结果设置标志"时.像add这样的指令.

你也可以(也应该)在调试器中尝试一些东西:单步并观察寄存器的变化.为 ecx 使用较小的起始值,以便您更快地到达有趣的 ecx=1 部分.另请参阅x86 标签维基,获取底部的手册、指南和 asm 调试技巧的链接.


顺便说一句,如果循环内未显示的指令修改了ecx,它可以循环任意次数.为了让问题有一个简单而唯一的答案,您需要保证标签和loop 指令之间的指令不会修改ecx.(他们可以保存/恢复它,但如果你打算这样做,通常最好使用不同的寄存器作为循环计数器.push/pop 在一个循环使您的代码难以阅读.)


抱怨过度使用 LOOP 即使您已经需要在循环中增加其他内容.LOOP 不是唯一的循环方式,通常是最糟糕的.

您通常不应该使用循环指令,除非以牺牲速度为代价优化代码大小,因为它很慢.编译器不使用它.(因此 CPU 供应商不必费心让它变快;抓住 22.)使用 dec/jnz,或完全不同的循环条件.(另请参阅 http://agner.org/optimize/ 以了解有关什么是有效的更多信息.)

循环甚至不必使用计数器;将指针与结束地址进行比较或检查其他条件通常同样好,甚至更好.(毫无意义地使用 loop 是我最讨厌的事情之一,尤其是当你在另一个寄存器中已经有了可以用作循环计数器的东西时.)使用 cx 作为循环计数器当您可以在另一个寄存器上使用 cmp/jcc 时,通常只会占用您宝贵的少数寄存器之一.

IMO,loop 应该被视为初学者不应分心的那些晦涩的 x86 指令之一.像 stosd(没有 rep 前缀)、aamxlatb.不过,在优化代码大小时,它确实有实际用途.(这有时在现实生活中对机器代码(例如引导扇区)很有用,而不仅仅是像 代码高尔夫.)

IMO,只是教/学习条件分支的工作原理,以及如何利用它们进行循环.这样你就不会陷入思考使用 loop 的循环有什么特别之处.我看过一个 SO 问题或评论,上面写着我认为你必须声明循环"之类的内容,但没有意识到 loop 只是一条指令.

.就像我说的,loop 是我最讨厌的事情之一.这是一个晦涩难懂的代码打高尔夫球指令,除非您针对实际的 8086 进行优化.

            mov    ecx, 16
looptop:    .
            .
            .
            loop looptop

How many times will this loop execute?

What happens if ecx = 0 to start with? Does loop jump or fall-through in that case?

解决方案

loop is exactly like dec ecx / jnz, except it doesn't set flags.

It's like the bottom of a do {} while(--ecx != 0); in C. If execution enters the loop with ecx = 0, wrap-around means the loop will run 2^32 times. (Or 2^64 times in 64-bit mode, because it uses RCX.)

Unlike rep movsb/stosb/etc., it doesn't check for ECX=0 before decrementing, only after1.

The address-size determines whether it uses CX, ECX, or RCX. So in 64-bit code, addr32 loop is like dec ecx / jnz, while a regular loop is like dec rcx / jnz. Or in 16-bit code, it normally uses CX, but an address-size prefix (0x67) will make it use ecx. As Intel's manual says, it ignores REX.W, because that sets the operand-size, not the address-size.

rep string instructions use the address-size prefix the same way, overriding the address size but also RCX vs. ECX (or CX vs. ECX in modes other than 64-bit). The operand-size for string instructions is already used to determine movsw vs. movsd vs. movsq, and you want address/repeat size to be orthogonal to that. Having loop and jrcxz/jecxz follow that behaviour is just continuing the design intent from 8086 of loop being intended for use with string operations when a simple rep couldn't get the job done; see below.

Related: Why are loops always compiled into "do...while" style (tail jump)? for more about loop structure in asm, while() {} vs. do {} while() and how to lay them out.


Footnote 1: jcxz (or x86-64 jrcxz) was intended for use before the top of a do {} while style loop, to skip it if it should run 0 times. On modern CPUs test rcx, rcx / jz is more efficient.

Stephen Morse, architect of 8086, wrote about the intended uses of loop/jcxz with string instructions in that section of his book, The 8086 Primer, available for free on his web site: https://www.stevemorse.org/8086/index.html. See the "complex string instructions" subsection, starting at the bottom of page 71. (Or start reading from earlier in the chapter, the whole String Instructions section starts on page 66. But note @ecm's review of a few things the book seems to explain poorly or incorrectly.)

If you're wondering about the design intent of x86 instructions, you won't find a better source than this. That's separate from the best / most efficient way to use them, especially on modern x86, but very good intro for beginners into what you can do with asm instructions as building blocks.


Extra debugging tips

If you ever want to know the details on an instruction, check the manual: either Intel's official vol.2 PDF instruction set reference manual, or an html extract with each entry on a different page (http://felixcloutier.com/x86/). But note that the HTML leaves out the intro and appendices that have details on how to interpret stuff, like when it says "flags are set according to the result" for instructions like add.

And you can (and should) also just try stuff in a debugger: single-step and watch registers change. Use a smaller starting value for ecx so you get to the interesting ecx=1 part sooner. See also the x86 tag wiki for links to manuals, guides, and asm debugging tips at the bottom.


And BTW, if the instructions inside the loop that aren't shown modify ecx, it could loop any number of times. For the question to have a simple and unique answer, you need a guarantee that the instructions between the label and the loop instruction don't modify ecx. (They could save/restore it, but if you're going to do that it's usually better to just use a different register as the loop counter. push/pop inside a loop makes your code hard to read.)


Rant about over-use of LOOP even when you already need to increment something else in the loop. LOOP isn't the only way to loop, and usually it's the worst.

You should normally never use the loop instruction unless optimizing for code-size at the expense of speed, because it's slow. Compilers don't use it. (So CPU vendors don't bother to make it fast; catch 22.) Use dec / jnz, or an entirely different loop condition. (See also http://agner.org/optimize/ to learn more about what's efficient.)

Loops don't even have to use a counter; it's often just as good if not better to compare a pointer to an end address, or to check for some other condition. (Pointless use of loop is one of my pet peeves, especially when you already have something in another register that would work as a loop counter.) Using cx as a loop counter often just ties up one of your precious few registers when you could have used cmp/jcc on another register you were incrementing anyway.

IMO, loop should be considered one of those obscure x86 instructions that beginners shouldn't be distracted with. Like stosd (without a rep prefix), aam or xlatb. It does have real uses when optimizing for code size, though. (That's sometimes useful in real life for machine code (like for boot sectors), not just for stuff like code golf.)

IMO, just teach / learn how conditional branches work, and how to make loops out of them. Then you won't get stuck into thinking there's something special about a loop that uses loop. I've seen an SO question or comment that said something like "I thought you had to declare loops", and didn't realize that loop was just an instruction.

</rant>. Like I said, loop is one of my pet peeves. It's an obscure code-golfing instruction, unless you're optimizing for an actual 8086.

这篇关于x86 LOOP 指令究竟是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆