在汇编中声明和索引qword的整数数组 [英] Declaring and indexing an integer array of qwords in assembly

查看:330
本文介绍了在汇编中声明和索引qword的整数数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于如何在程序集中初始化数组的问题.我试过了:

I have a question regarding how to initialize an array in assembly. I tried:

.bss
#the array
unsigned:    .skip 10000
.data
#these are the values that I want to put in the array
par4:   .quad 500 
par5:   .quad 10
par6:   .quad 15

这就是我声明字符串和要放入其中的变量的方式. 这是我尝试将它们放入数组的方式:

That's how I declared my string and the variables that I want to put it inside. This is how I tried to put them into the array:

movq $0 , %r8

movq par4 , %rax
movq %rax , unsigned(%r8)
incq %r8

movq par5 , %rax
movq %rax , unsigned(%r8)
incq %r8

movq par6 , %rax
movq %rax , unsigned(%r8)

我尝试打印这些元素以检查一切是否正常,只有最后一个可以正常打印,另外两个具有一些怪异的值.

I tried printing the elements to check if everything is okay, and only the last one prints okay, the other two have some weird values.

也许这不是我应该声明和使用它的方式?

Maybe this is not the way I should declare and work with it?

推荐答案

首先,unsigned是C语言中一个类型的名称,因此对于数组而言,这是一个糟糕的选择.让我们称之为arr.

First of all, unsigned is the name of a type in C, so it's a poor choice for an array. Let's call it arr instead.

您要将BSS中的该空间块视为数组qword元素.因此,每个元素为8个字节. 因此您需要存储到arr+0arr+8arr+16.(阵列的总大小为10000个字节,即10000/8个qword).

You want to treat that block of space in the BSS as an array qword elements. So each element is 8 bytes. So you need to store to arr+0, arr+8, and arr+16. (The total size of your array is 10000 bytes, which is 10000/8 qwords).

但是您将%r8用作字节偏移量,而不是缩放索引.在其他所有条件相同的情况下,这通常是一件好事.在某些情况下,某些CPU的索引寻址模式较慢.但是问题是您只能使用inc将其增加1,而不使用 add $8, %r8 .

But you're using %r8 as a byte offset, not a scaled-index. That's generally a good thing, all else equal; indexed addressing modes are slower in some cases on some CPUs. But the problem is you only increment it by 1 with inc, not with add $8, %r8.

所以您实际上是存储到arr+0arr+1arr+2中,并且8字节的存储彼此重叠,只剩下最后一个字节的最低有效字节店铺. x86是低位字节序,因此内存的结果内容实际上就是这个,后面的其余未写字节则保持为零.

So you're actually storing to arr+0, arr+1, and arr+2, with 8-byte stores that overlap each other, leaving just the least-significant byte of the last store. x86 is little-endian so the resulting contents of memory is effectively this, followed by the rest of the unwritten bytes that stay zero.

# static array that matches what you actually stored
arr: .byte 500 & 0xFF, 10, 15, 0, 0, 0, 0, 0, 0, 0, ...

您当然可以只使用.data部分中的.qword来声明包含所需内容的静态数组.但是只有前三个元素不为零时,将它放在BSS中才有意义,而不是使OS页面位于磁盘中的零中.

You could of course just use .qword in the .data section to declare a static array with the contents you want. But with only the first 3 element non-zero, putting it in the BSS makes sense for one that large, instead of a having the OS page in the zeros from disk.

如果要完全展开,而不是使用从par4开始的3元素qword数组循环,则根本不需要增加寄存器.您也不需要将初始化程序存储在数据存储器中,只需使用立即数即可,因为它们都适合32位符号扩展.

If you're going to fully unroll instead of using a loop over your 3-element qword array starting at par4, you don't need to increment a register at all. You also don't need the initializers to be in data memory, you can just use immediates because they all fit as 32-bit sign-extended.

  # these are assemble-time constants, not associated with a section
.equ par4, 500
.equ par5, 10
.equ par6, 15

.text  # already the default section but whatever

.globl _start
_start:
    movq    $par4, arr(%rip)            # use RIP-relative addressing when there's no register
    movq    $par5, arr+8(%rip)
    movq    $par6, arr+16(%rip)

    mov $60, %eax
    syscall               # Linux exit(0)

.bss
    arr:   .skip 10000

您可以在GDB下运行它,并检查内存以查看得到的结果. (用gcc -nostdlib -static foo.s编译).在GDB中,使用starti启动程序(以在入口点停止),然后使用si单步执行.使用x /4g &arr将内存中的内容作为4个qword的数组转储到arr.

You can run that under GDB and examine memory to see what you get. (Compile it with gcc -nostdlib -static foo.s). In GDB, start the program with starti (to stop at the entry point), then single-step with si. Use x /4g &arr to dump the contents of memory at arr as an array of 4 qwords.

或者,如果您确实想使用寄存器,则最好只循环指针而不是索引.

Or if you did want to use a register, might as well just loop a pointer instead of an index.

    lea     arr(%rip), %rdi           # or mov $arr, %edi in a non-PIE executable
    movq    $par4, (%rdi)
    add     $8, %rdi                  # advance the pointer 8 bytes = 1 element
    movq    $par5, (%rdi)
    add     $8, %rdi
    movq    $par6, (%rdi)

或比例索引:

## Scaled-index addressing
    movq    $par4, arr(%rip)
    mov     $1, %eax
    movq    $par5, arr(,%rax,8)       # [arr + rax*8]
    inc     %eax
    movq    $par6, arr(,%rax,8)


有趣的把戏:您可以设置字节存储而不是qword存储来设置低字节,而将其余的保留为零.这样可以节省代码大小,但是如果您立即进行qword加载,则会遇到存储转发停顿的情况. (存储/重装将缓存中的数据与存储缓冲区中的存储合并的时间增加了约10个循环)


Fun trick: you could just do a byte store instead of a qword store to set the low byte, and leave the rest zero. This would save code-size but if you did a qword load right away, you'd get a store-forwarding stall. (~10 cycles extra latency for the store/reload to merge data from the cache with the store from the store buffer)

或者,如果您 did 仍想从.rodata 中的par4复制24个字节,则可以使用SSE. x86-64保证SSE2可用.

Or if you did still want to copy 24 bytes from par4 in .rodata, you could use SSE. x86-64 guarantees that SSE2 is available.

    movaps   par4(%rip), %xmm0
    movaps   %xmm0, arr(%rip)          # copy par4 and par5

    mov      par6(%rip), %rax          # aka par4+16
    mov      %rax, arr+16(%rip)

.section .rodata          # read-only data.
.p2align 4         # align by 2^4 = 16 for movaps
  par4:  .quad 500
  par5:  .quad 10
  par6:  .quad 15

.bss
.p2align 4        # align by 16 for movaps
  arr: .skip 10000
# or use .lcomm arr, 10000  without even switching to .bss

或者使用SSE4.1,您可以加载和扩展小常量,这样就不必将要复制到BSS数组中的每个小数字都用一个完整的qword.

Or with SSE4.1, you can load+expand small constants so you don't need a whole qword for each small number that you're going to copy into the BSS array.

    movzxwq    initializers(%rip), %xmm0       # zero-extend 2 words into 2 qwords
    movaps     %xmm0, arr(%rip)
    movzwl     initializers+4(%rip), %eax      # zero-extending word load
    mov        %rax, arr+16(%rip)

.section .rodata
  initializers: .word 500, 10, 15

这篇关于在汇编中声明和索引qword的整数数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆