如何地址操作数影响整机code的性能和尺寸? [英] How does address operand affect performance and size of machine code?

查看:131
本文介绍了如何地址操作数影响整机code的性能和尺寸?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

32位CPU模式下启动,也有扩展可用于x86架构的地址操作数。一个可指定的基地址,位移,索引寄存器和一个比例因子

Starting with 32-bit CPU mode, there are extended address operands available for x86 architecture. One can specify the base address, a displacement, an index register and a scaling factor.

例如,我们想通过的32位整数(每头两个从32字节长的数据结构的数组,%RDI 列表阔步作为数据索引,%RBX 为基指针)。

For example, we would like to stride through a list of 32-bit integers (every first two from an array of 32-byte-long data structures, %rdi as data index, %rbx as base pointer).

addl   $8, %rdi                # skip eight values: advance index by 8
movl   (%rbx, %rdi, 4), %eax   # load data: pointer + scaled index
movl   4(%rbx, %rdi, 4), %edx  # load data: pointer + scaled index + displacement

据我所知,这种复杂的寻址适合单个机器code指令。但是,什么是这种操作的成本,它是如何比较简单的具有自主指针计算解决:

As I know, such complex addressing fits into a single machine-code instruction. But what is the cost of such operation and how does it compare to simple addressing with independent pointer calculation:

addl  $32, %rbx      # skip eight values: move pointer forward by 32 bytes
movl  (%rbx), %eax   # load data: pointer
addl  $4, %rbx       # point next value: move pointer forward by 4 bytes
movl  (%rbx), %edx   # load data: pointer

在后面的示例中,我已经介绍了一个额外的指令和依赖。但整数加法是非常快的,我获得更简单的地址的操作数,并且没有乘法任何更多。另一方面,由于所允许的比例因子是2的幂,乘法归结为一个比特移位,这也是一个非常快的操作。仍然,两个加法和比特移位可以用一个除被取代。

In the latter example, I have introduced one extra instruction and a dependency. But integer addition is very fast, I gained simpler address operands, and there are no multiplications any more. On the other hand, since the allowed scaling factors are powers of 2, the multiplication comes down to a bit shift, which is also a very fast operation. Still, two additions and a bit shift can be replaced with one addition.

什么是这两种方法之间的性能和code尺寸的差异?是否有使用扩展寻址的操作数的最佳做法?

What are the performance and code size differences between these two approaches? Are there any best practices for using the extended addressing operands?

或者,要求它从一个C程序员的角度来看,什么是快?数组索引或指针运算

Or, asking it from a C programmer's point of view, what is faster: array indexing or pointer arithmetic?

有任何组装编辑意味着尺寸/性能调整?我希望我能看到每个汇编指令的机器code尺寸,在时钟周期或依赖关系图的执行时间。有成千上万的装配怪胎,将在这样的应用中受益,所以我敢打赌,这样的事情已经存在!

Is there any assembly editor meant for size/performance tuning? I wish I could see the machine-code size of each assembly instruction, its execution time in clock cycles or a dependency graph. There are thousands of assembly freaks that would benefit from such application, so I bet that something like this already exists!

推荐答案

地址算术是非常快的,应该始终尽可能使用。

The address arithmetic is very fast and should be used always if possible.

但在这里是什么,问题缺失。

But here is something that the question misses.

首先你不能使用32地址运算乘 - 8是最大可能的恒

At first you can't multiply by 32 using address arithmetic - 8 is the maximal possible constant.

在code在不第一个版本完成,因为它需要第二个指令,即递增 RBX 。因此,我们有以下两个变种:

The first version of the code in not complete, because it will need second instruction, that to increment rbx. So, we have following two variants:

inc  rbx          
mov  eax, [8*rbx+rdi]

VS

add  rbx, 8
mov  eax, [rbx]

此方式,两个变体的速度将是相同的。的大小是相同的 - 6字节为好。

This way, the speed of the two variants will be the same. The size is the same - 6 bytes as well.

那么,什么code是更好只取决于程序上下文 - 如果我们有一个已经包含了所需的阵列单元的地址寄存器 - 使用MOV EAX,[RBX]

So, what code is better depends only on the program context - if we have a register that already contains the address of the needed array cell - use mov eax, [rbx]

如果我们有一个包含该单元的索引寄存器,另一个含有该起始地址,然后使用第一变体。通过这种方式,算法结束后,我们将仍然有在RDI阵列的起始地址。

If we have register containing the index of the cell and another containing the start address, then use the first variant. This way, after the algorithm ends, we still will have the start address of the array in rdi.

这篇关于如何地址操作数影响整机code的性能和尺寸?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆