有关生成的程序集的两个基本问题 [英] Two basic questions about generated assembly
问题描述
大家好,
阅读本文后有两个问题,
http://www.microsoft.com/msj/0298/hood0298.aspx
> 1.
为什么使用LEA进行乘法运算比使用MUL更快?
使用"LEA EAX,[EAX * 4 + EAX]"的结果比MUL指令要快."
2.
"可以在TEB中的偏移量0x18处找到TEB的线性地址."-什么是线性地址?像数组之类的东西,哪些元素彼此相邻?什么是非线性地址?
先谢谢了,
George
George_George写道:为什么使用LEA进行乘法比使用MUL更快?
您必须了解其内部结构和电路CPU回答该问题;我没有,我怀疑除了在英特尔工作(或曾经工作过)的人以外,还会有很多人.
George_George写道:可以在TEB中的偏移量0x18处找到TEB的线性地址." -什么是线性地址?像数组之类的东西,哪些元素彼此相邻?什么是非线性地址?
要了解这里发生的情况,您必须对Intel CPU和段寄存器有所了解.基本上,C/C ++没有段寄存器的概念,并且它(假定线性地址空间)是这样的,因此这是OS进行的页表映射技巧,以使TEB在这种环境中可寻址.
1.
同一篇文章中,在您引用的句子下方.
LEA指令使用硬连接的地址生成表,该表使乘以一组选定的数字非常快(例如,乘以3、5和9).
这确实是扭曲的.
这意味着LEA
指令仅对一小部分乘法器的值而言比MUL
快.
2.
George_George写道:可以在TEB中的偏移量0x18处找到TEB的线性地址."-什么是线性地址?元素彼此相邻放置?
我认为它的意思是直接地址,即mov eax,dword ptr fs:[00000018h]用
TEB
的地址加载eax
,因此下面的指令
加载 eax
,其值在TEB
(线程ID
)的int 0x24处偏移.
George_George写道:什么是非线性地址?
我想它是间接寻址(在这种情况下通过FS
寄存器).
:)
更多阅读材料:奔腾优化交叉引用 [ ^ ].
从页面开始: LEA比Pentium上的SHL好,因为它在两个管道中都配对,仅SHL对另外,正如CPallini指出的那样,该文件指出,只有乘以2、3、4、5、7、8、9时,lea才能比mul受益.
Hello everyone,
Two questions after readnig this article,
http://www.microsoft.com/msj/0298/hood0298.aspx
1.
why using LEA to do multiplication is faster than using MUL?
"Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction."
2.
"The TEB''s linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address?
thanks in advance,
George
George_George wrote:why using LEA to do multiplication is faster than using MUL?
You''d have to know about the internal architecture and circuitry of the CPU to answer that; I don''t and I doubt there would be many people except for people that work (or have worked) at Intel that would.George_George wrote:"The TEB''s linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other? What means non-linear address?
To understand what''s going on here you have to know a little about Intel CPUs and segment registers. Basically C/C++ has no concept of segment registers and such (it assumes a linear address space) so this is a page-table mapping trick done by the OS to make the TEB addressable in such an environment.
1.
From the same article, below your quoted sentence.
The LEA instruction uses hardwired address generation tables that makes multiplying by a select set of numbers very fast (for example, multiplying by 3, 5, and 9). Twisted, but true.
That meansLEA
instruction is faster thanMUL
only for a small set of multipliers''s value.
2.George_George wrote:The TEB''s linear address can be found at offset 0x18 in the TEB." -- what means linear address? Something like array, which elements are put next to each other?
I think it means direct address, i.e.mov eax,dword ptr fs:[00000018h]load
eax
with the address ofTEB
, hence the following instructionmov eax,dword ptr [eax+24h]loads
eax
with value found at offset 0x24 int theTEB
(the ThreadID
).George_George wrote:What means non-linear address?
I suppose it is indirect addressing (viaFS
register in this context).
:)
Some more read material: Pentium Optimization Cross-Reference[^].
From the page: LEA is better than SHL on the Pentium because it pairs in both pipes, SHL pairs only in the U pipe.
Also, as CPallini pointed out, the document states that lea can be beneficial than mul only when multiplied by 2, 3, 4, 5, 7, 8, 9.
这篇关于有关生成的程序集的两个基本问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!