不同的通用寄存器需要更多周期? [英] Different General Purpose Registers take more cycles ?

查看:150
本文介绍了不同的通用寄存器需要更多周期?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的英特尔矢量汇编代码中有一个循环。在循环中,循环计数器用于读取和写入4个连续的存储器位置。例如,



I have a loop in my Intel Vector assembly code. In the loop, the loop counter is used to read from and write to 4 consecutive memory locations. For example,

vmovdqu [r9 + rdx + 64], y0<br />
 vmovdqu [r9 + rdx + 96], y1





这是我的循环计数器。在分析期间,我注意到使用r10d而不是rdx寄存器会增加周期。初始化为r10d比rdx多1个字节。循环增加的原因是什么?



where is my loop counter. During profiling, I notice that using "r10d" instead of "rdx" register increases cycles. The initialisation to "r10d" takes 1 byte more than that to "rdx". What could be the reason for the cycle increase ?

推荐答案

嗯,您可以在制造商提供的名为使用说明书的文档中找到该信息处理器每个处理器制造商都提供此文档,但是它不是免费的。你必须付钱。 :)



-KR
Well, You may find that info in the documentation called Instruction Manual provided by manufacturer of the processor. Every processor manufacturer provides this documentation, but yeah it is not free. You have to pay for that. :)

-KR


您好KR,我有说明书,但它没有给我任何指示。

我最接近解决方案的地方是: http://stackoverflow.com/questions/17896714/why-would-introducing-useless-mov-instructions-speed-up-a-tight-loop- in-x86-64-a [ ^ ]
Hi KR , I have the instruction manual, but it does not give me any pointers.
The closest I came to a solution is here: http://stackoverflow.com/questions/17896714/why-would-introducing-useless-mov-instructions-speed-up-a-tight-loop-in-x86-64-a[^]


这篇关于不同的通用寄存器需要更多周期?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆