为什么INC和ADD 1有不同的表现? [英] Why INC and ADD 1 have different performances?

查看：208 发布时间：2020/5/21 20:53:41 optimization assembly x86 hardware cpu-architecture

本文介绍了为什么INC和ADD 1有不同的表现?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这些年来，我已经读过很多次了，您应该对XOR进行XOR编程，因为它速度更快...或在C语言中进行编程时使用counter ++或counter + = 1，因为它们会进行INC或ADD ... Netburst Pentium 4 INC比ADD 1慢，因此必须警告编译器您的目标是Netburst，这样它将所有var ++转换为ADD 1 ...

我的问题是:为什么INC和ADD有不同的表现?例如为什么为什么在Netburst上声称INC较慢，而在其他处理器上却比ADD快呢?

解决方案

对于x86架构，INC更新条件代码的子集，而ADD更新整个条件代码集. (其他体系结构具有不同的规则，因此此讨论可能适用也可能不适用).

因此，INC指令必须先等待其他更新条件码位的先前指令完成，然后才能修改该先前值以产生其最终条件码结果.

ADD可以生成最终条件代码位，而无需考虑条件代码的先前值，因此它无需等待先前的指令来完成对它们的条件代码值的计算.

结果:您可以与许多其他指令并行执行ADD，并与其他较少的指令并行执行INC.因此，ADD在实践中似乎更快.

(我相信在全宽寄存器(例如EAX)的上下文中使用8位寄存器(例如AL)存在类似的问题，因为AL更新要求首先完成先前的EAX更新). /p>

我不再在高性能汇编代码中使用INC或DEC.如果您对执行时间不很敏感，那么INC或DEC很好，可以减小指令流的大小.

I've read many times over the years that you should do XOR ax, ax because it is faster... or when programming in C use counter++ or counter+=1 because they would INC or ADD... Or that in the Netburst Pentium 4 the INC was slower than ADD 1 so the compiler had to be warned that your target was a Netburst so it would translate all var++ to ADD 1...

My question is: Why INC and ADD have different performances? Why for example INC was claimed to be slower on Netburst while faster than ADD in other processors?

解决方案

For the x86 architecture, INC updates on a subset of the condition codes, whereas ADD updates the entire set of condition codes. (Other architectures have different rules so this discussion may or may not apply).

So an INC instruction must wait for other previous instructions that update the condition code bits to finish, before it can modify that previous value to produce its final condition code result.

ADD can produce final condition code bits without regard to previous values of the condition codes, so it doesn't need to wait for previous instructions to finish computing their value of the condition codes.

Consequence: you can execute ADD in parallel with lots of other instructions, and INC with fewer other instructions. Thus, ADD appears to be faster in practice.

(I believe there is a similar issue with working with 8 bit registers (e.g., AL) in the context of full width registers (e.g., EAX), in that an AL update requires that previous EAX updates complete first).

I don't use INC or DEC in my high performance assembly code anymore. If you aren't ultrasensitive to execution times, then INC or DEC is just fine and can reduce the size of your instruction stream.

这篇关于为什么INC和ADD 1有不同的表现?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么INC和ADD 1有不同的表现? [英] Why INC and ADD 1 have different performances?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么INC和ADD 1有不同的表现? [英] Why INC and ADD 1 have different performances?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭