返回数组中最小整数的汇编代码,而不是随机返回最后一个或倒数第二个数字 [英] Assembly code to return smallest integer in array instead randomly returns either last or second to last number

查看:49
本文介绍了返回数组中最小整数的汇编代码,而不是随机返回最后一个或倒数第二个数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在 nasm 中创建一个函数,给定一个整数数组和数组的长度,返回最小的整数.这是基于 CodeWars 问题查找数组中的最小整数".我在 64 位 BlackArch Linux 上执行此操作.我的函数如下所示:

SECTION .text全局 find_smallest_intfind_smallest_int:;[rdi] 是数组中的第一个值.;我们将存储迄今为止找到的最小值;在 rax.数组中的第一个值是;迄今为止发现的最小的,因此我们存储它;在 rax.mov rax, [rdi];rsi 是 int find_smallest_int(int *, int) 的第二个参数;它表示数组的长度.;将其存储在 rbx 中以明确.mov rbx, rsi环形:;检查我们是否已经到达数组的末尾.;如果有,我们跳到函数的末尾,然后;返回最小值(应该是任何;目前在 rax 中.cmp rbx, 0je end;从我们的计数器中减去一个.这开始于;数组中的元素数 - 当它;到 0,我们将遍历整个事情.子 rbx, 1;如果 rax 小于 [rdi],我们将跳到;循环的其余部分.只有当 rax 大于 [rdi] 时才会;我们将 rax 重新分配为新的最小但 vaue.cmp rax,[rdi]jl 后分配分配:;如果我们执行这段代码,就意味着 rax 不会少;比[rdi].因此,我们可以安全地重新分配;rax 到 [rdi].mov rax, [rdi]后分配:;将 rdi 设置为指向数组中的下一个值添加 rdi, 4;如果我们到了这里,那么我们还没有完成循环;因为 rbx(计数器)还没有达到 0.jmp循环结尾:回复

然后我通过以下 C 代码调用此函数:

extern int find_smallest_int(int *array, int size);int main(void){int nums[4] = {800, 300, 100, 11};int ret = find_smallest_int(nums, 4);返回 ret;}

最后,我使用以下命令编译并运行整个程序:

#!/bin/bash# 使用 nasm 从我的汇编代码中创建一个目标文件nasm -f elf64 -o sum.o call_sum.s# 从我的 C 代码创建一个目标文件gcc -O0 -m64 -c -o call_sum.o call_sum.c -g# 将我的两个目标文件编译成一个可执行文件gcc -O0 -m64 -o 运行 sum.o call_sum.o -g# 运行可执行文件并获取输出# 退出代码的形式../跑步回声 $?

我没有得到最小的整数,而是得到 100 或 11(分别传递给我的汇编函数的整数数组的倒数第二个和最后一个成员).我得到的结果似乎是完全随机的.我可以运行程序几次得到 11,然后再运行几次,然后开始得到 100.

如果有人能帮助我理解这种奇怪的行为,我将不胜感激.谢谢!

更新:我实现了 Jester 评论中的更改(使用 32 位寄存器来保存整数)并且它有效,但我不明白为什么.

解决方案

这个答案的开头是基于 Jester 的评论.它只是对此进行了扩展,并更详细地解释了这些变化.我也做了一些额外的更改,其中两个也解决了您来源中的错误.

首先,这部分:

<块引用>

int 是 4 个字节,但您在整个代码中使用了 8 个字节.使用 eax 而不是 rax.

示例中的这些指令每个都访问数组中的 8 个字节:

 mov rax, [rdi]cmp rax,[rdi]mov rax, [rdi]

这是因为 rax 是一个 64 位寄存器,因此执行完整的 rax 加载或与内存操作数进行比较会访问 8 字节的内存.在 NASM 语法中,您可以明确指定内存操作数的大小,例如通过编写以下内容:

 mov rax, qword [rdi]

如果您这样做了,您可能更早发现您正在以 8 字节为单位(四字)访问内存.尝试在使用 rax 作为目标寄存器时显式访问双字会失败.以下行在汇编时导致错误操作数大小不匹配":

 mov rax, dword [rdi]

以下两行都很好,都从双字内存操作数加载到 rax 中.第一个使用零扩展(在写入 32 位寄存器部分时隐含在 AMD64 指令集中),第二个使用(显式)符号扩展:

 mov eax, dword [rdi]movsx rax, dword [rdi]

(从双字内存操作数到 raxmovzx 指令不存在,因为 mov 到 eax 将是多余的.)

稍后在您的示例中,您使用 rdi 作为 4 字节宽类型的地址,通过向其添加 4 来推进数组入口指针:

 添加 rdi, 4

这对于 int 类型是正确的,但与您使用四字作为内存操作数的大小相冲突.

Jester 的评论给出了另外两个问题:

<块引用>

也不要使用rbx,因为这是一个被调用者保存的寄存器,无论如何从rsi 复制是没有意义的.和以前一样,你最好使用 esi 因为那是另一个 int.

rsi 问题是 64 位 rsi 的高 32 位可能包含非零值,具体取决于 ABI.如果您不确定是否允许使用非零值,您应该假设它是,并且您应该只在 esi 中使用 32 位值.

rbx(或 ebx)的问题是 rbx 需要在 Linux 使用的 AMD64 psABI 的函数调用中保留,请参阅 x86-64 System V ABI 的文档在哪里?,了解该 ABI 的文档.在您的简单测试程序中,更改 rbx 可能不会导致任何失败,但在非平凡的上下文中很容易导致失败.

我发现的下一个问题是您对 eax 的初始化.你是这样写的:

 ;[rdi] 是数组中的第一个值.;我们将存储迄今为止找到的最小值;在 rax.数组中的第一个值是;迄今为止发现的最小的,因此我们存储它;在 rax.mov rax, [rdi]

但是,正如您的循环流控制逻辑所证明的那样,您允许调用者为 size 参数传入零.在这种情况下,您根本不应该访问数组,因为数组中的第一个值"甚至可能不存在或被初始化为任何东西.从逻辑上讲,您应该使用 INT_MAX 而不是第一个数组条目来初始化最小的值.

还有一个问题:您使用 rsiesi 作为无符号数,倒计时为零.但是,在函数声明中,您将 size 参数的类型指定为带符号的 int.我通过将声明更改为 unsigned int 来解决这个问题.

我对您的程序进行了一些可选更改.我将 NASM 本地标签用于您的函数的子"标签,这很有用,因为您可以在其他函数中重复使用例如 .loop.end如果要添加任何相同的源文件.

我还更正了其中一条评论,指出我们会因为 eax 小于数组条目而跳转,并且不会因为 eax 大于或等于数组条目而跳转.您可以将此条件跳转更改为 jle 代替它也将跳转进行相等比较.可以说,出于清晰度或性能的考虑,可能更喜欢一个或另一个,但我对哪一个没有太多答案.

我还使用了 dec esi 而不是 sub esi, 1 这不是更好,但只是更适合我.在 32 位模式下,dec esi 是单字节指令.但在 64 位模式下并非如此;dec esi 是 2 个字节,而 sub esi, 1 是 3 个字节.

此外,我将 esi 为零的初始检查从使用 cmp 更改为 test,稍微好一点,参考 使用 CMP reg,0 vs OR reg,reg 测试寄存器是否为零?

最后,我将实际循环条件更改为在循环体的末尾,这意味着循环少使用了一条跳转指令.循环体开始处的无条件跳转被检查while 条件的条件跳转替换.仍然需要函数开头的 test 来处理零长度数组的可能性.此外,我没有再次使用 cmptest 来检查 esi 中的零,而是使用 esi 中已经设置的零标志code>dec 指令检查 esi 是否递减为零.

您可以使用 ecxrcx 作为循环计数器,但这在现代 CPU 上可能不会有太大的好处.如果您使用 jrcxzjecxzloop 指令,将允许稍微紧凑的代码.但不推荐使用它们,因为性能较慢.

不是与 dword [rdi] 比较,然后,如果小于或等于 eax,从相同的内存 dword 加载,您可以加载数组首先将条目的值存入寄存器,然后将其用作 cmpmov 的源.这可能更快,但确实会导致更多的操作码字节.

我用来将目标索引(64 位模式中的 rdi)提前 4 的一个技巧是使用单个 scasd 指令,该指令仅修改标志和索引寄存器.这是一条单字节指令,而不是 4 字节的 add rdi, 4,但很可能运行速度较慢.

我上传了一个包含您的原始来源和我对 https://hg.ulukai.org/ecm/testsmal/file/2b8637ca416a/的改进的存储库hg.ulukai.org/ecm/testsmal/file/2b8637ca416a/(在CC BY-SA的stackoverflow内容使用条件下拍摄.)我也修改了C部分和测试脚本,但这些都是琐碎的,主要是与你的问题无关.汇编源代码如下:

<预><代码>INT_MAX 等于 7FFF_FFFFh节.text全局 find_smallest_intfind_smallest_int:;如果数组为空(大小 = 0),那么我们要返回;根本没有从数组中读取.要返回的值;那么逻辑上应该是最大可能的数字;32 位有符号整数.这在 C 中称为 INT_MAX;header limit.h 和 32 位 int 等于 7FFF_FFFFh.;;如果数组不为空,则第一次迭代将;总是让我们的结果寄存器等于中的值;第一个数组条目.这要么等于 INT_MAX;再次,或少于那个.移动 eax, INT_MAX;esi 是我们函数的第二个参数,它是;声明为 int find_smallest_int(int *, unsigned int).;它表示数组的长度.我们用这个;作为计数器.rsi(及其部分 esi)不需要保留;跨函数调用使用的 AMD64 psABI;Linux,见 https://stackoverflow.com/a/40348010/738287;检查 esi 中的初始零值.如果找到这个,;跳过循环而不进行任何迭代(而 x 做 y)和;在开始时返回初始化为 INT_MAX 的 eax.测试 esi, esijz.end.环形:;如果 eax 小于 dword [rdi],我们将跳到;循环的其余部分.仅当 eax 大于或等于;dword [rdi] 我们将 eax 重新分配给那个,以保持;新的最小值.cmp eax, 双字 [rdi]jl .postassign.分配:;如果我们执行这段代码,就意味着 eax 不会少;比 dword [rdi].因此,我们可以安全地重新分配;eax 到 dword [rdi].mov eax, dword [rdi].postassign:;将 rdi 设置为指向数组中的下一个值.添加 rdi, 4;从我们的计数器中减去一个.这开始于;数组中的元素数 - 当它;到 0,我们将遍历整个事情.德西;检查我们是否已经到达数组的末尾.;为此,我们使用先前设置的零标志;十二月指令.如果 esi 已经达到零 (ZR) 那么;我们不继续循环.在这种情况下,我们返回;找到的最小值(目前在 eax 中).;;否则,我们跳到循环的开始以开始下一个;迭代.循环.结尾:返回

这是循环体内条件跳转的替代方法.cmov 指令似乎被所有 AMD64 CPU 支持.这是一个有条件的移动:如果满足条件,它会像 mov 一样运行——否则它没有任何影响,只有一个例外:它可能会读取源操作数(因此可能会因内存访问而出错)).您可以翻转用于围绕 mov 的分支的条件以获得 cmov 指令的条件.(我遇到了 这个线程涉及 Linus Torvalds,这表明条件跳转解决方案实际上可能比 cmov 好或不差.按照你的意愿去做.)

.loop:;如果 eax 大于或等于 dword [rdi],我们将;将 eax 重新分配给那个 dword,新的最小的值.cmp eax, 双字 [rdi]cmovge eax, 双字 [rdi];将 rdi 设置为指向数组中的下一个值.添加 rdi, 4

I'm trying to create a function in nasm which, given an array of integers and the length of the array, returns the smallest integer. This is based on the CodeWars problem "Find the smallest integer in the array". I'm doing this on 64 bit BlackArch Linux. My function looks like this:

SECTION .text
global find_smallest_int

find_smallest_int:
  ; [rdi] is the first value in the array.
  ; We'll store the smallest value so far found
  ; in rax. The first value in the array is the
  ; smallest so far found, therefore we store it
  ; in rax.
  mov rax, [rdi]

  ; rsi is the second argument to int find_smallest_int(int *, int)
  ; which represents the length of the array.
  ; Store it in rbx to be explicit.
  mov rbx, rsi

  loop:
    ; Check to see if we've reached the end of the array.
    ; If we have, we jump to the end of the function and 
    ; return the smallest value (which should be whatever
    ; is in rax at the moment.
    cmp rbx, 0
    je end

    ; Subtract one from our counter. This started as 
    ; the number of elements in the array - when it
    ; gets to 0, we'll have looped through the entire thing.
    sub rbx, 1

    ; If rax is smaller than [rdi], we'll jump down to the
    ; rest of the loop. Only if rax is bigger than [rdi] will
    ; we reassign rax to be the new smallest-yet vaue.
    cmp rax, [rdi]
    jl postassign

    assign:
      ; If we execute this code, it means rax was not less
      ; than [rdi]. Therefore, we can safely reassign
      ; rax to [rdi].
      mov rax, [rdi]


    postassign:
    ; Set rdi to point to the next value in the array
    add rdi, 4

    ; if we get here, then we aren't finishing looping yet
    ; because rbx (the counter) hasn't eached 0 yet.
    jmp loop

  end:
    ret

I then call this function via the following C code:

extern int find_smallest_int(int *array, int size);

int main(void)
{
    int nums[4] = {800, 300, 100, 11};
    int ret = find_smallest_int(nums, 4);

    return ret;
}

Finally, I compile and run the whole thing using the following commands:

#!/bin/bash

# Make an object file from my assembly code with nasm
nasm -f elf64 -o sum.o call_sum.s

# make an object file from my C code
gcc -O0 -m64 -c -o call_sum.o call_sum.c -g

# compile my two object files into an executable
gcc -O0 -m64 -o run sum.o call_sum.o -g

# Run the executable and get the output in the
# form of the exit code.
./run
echo $?

Instead of getting the smallest integer, I either get 100 or 11 (the second to last and last members of the integer array that I pass to my assembly function, respectively). Which result I get appears to be completely random. I can run the program a few times and get 11, then run it a few more and then start getting 100.

If anyone could help me understand this strange behavior I'd appreciate it immensely. Thanks!

Update: I implemented the changes from Jester's comment (using 32bit registers to hold ints) and it works, but I do not really understand why.

解决方案

The beginning of this answer is based on what Jester commented on. It just expands on that and explains the changes in some more detail. I did some additional changes too, two of which are also addressing mistakes in your sources.

First, this part:

An int is 4 bytes but you use 8 all over your code. Use eax instead of rax.

These instructions in your example are accessing 8 bytes from the array each:

    mov rax, [rdi]

    cmp rax, [rdi]

    mov rax, [rdi]

This is because rax is a 64-bit register, so doing a full rax load or compare with a memory operand accesses 8 bytes of memory. In NASM syntax, you are allowed to specify the memory operand's size explicitly, such as by writing the following:

    mov rax, qword [rdi]

If you had done that you might have figured out earlier that you were accessing memory in 8-byte units (quadwords). Trying to access a doubleword explicitly while using rax as the destination register would fail. The following line causes the error "mismatch in operand sizes" at assembly time:

    mov rax, dword [rdi]

The following two lines are fine, and both load into rax from a doubleword memory operand. The first one uses zero-extension (which is implicit in the AMD64 instruction set when writing to a 32-bit register part), the second one uses (explicit) sign extension:

    mov eax, dword [rdi]
    movsx rax, dword [rdi]

(A movzx instruction from a dword memory operand to rax does not exist, as it would be redundant with the mov to eax.)

Later in your example, you're using rdi as the address of a 4-byte wide type, advancing the array entry pointer by adding 4 to it:

    add rdi, 4

This is correct for the int type, but clashes with your use of quadwords as the memory operands' sizes.

Two more issues are given by Jester's comment:

Also do not use rbx because that is a callee-saved register and it's pointless to copy from rsi anyway. As before, you'd better use esi because that's another int.

The rsi problem is that the high 32 bits of the 64-bit rsi might hold nonzero values depending on the ABI. If you aren't sure about whether a nonzero value is allowed, you should assume it is and that you should use the 32-bit value in esi only.

The rbx (or ebx) problem is that rbx needs to be preserved across function calls for the AMD64 psABI that is used by Linux, see Where is the x86-64 System V ABI documented? for the documentation of that ABI. In your simple test program changing rbx may not cause any failure, but it easily would in a non-trivial context.

The next problem I found is with your initialisation of eax. You wrote it this way:

  ; [rdi] is the first value in the array.
  ; We'll store the smallest value so far found
  ; in rax. The first value in the array is the
  ; smallest so far found, therefore we store it
  ; in rax.
  mov rax, [rdi]

However, as evidenced by your loop flow-control logic, you allow for the caller to pass in a zero for the size argument. In that case, you should not access the array at all, because "the first value in the array" may not even exist or be initialised to anything at all. Logically, you should initialise the smallest-yet value with INT_MAX instead of the first array entry.

There is one more problem: You're using rsi or esi as an unsigned number, counting down to zero. However, in your declaration of the function you specified the type of the size argument as int, which is signed. I fixed that by changing the declaration to unsigned int.

I did a few more optional changes to your program. I used NASM local labels for the "sub"-labels of your function, which is useful because you could re-use for example .loop or .end in other functions in the same source file if there were any to be added.

I also corrected one of the comments to note that we jump for eax being less than the array entry, and do not jump for eax being greater than or equal to the array entry. You could change this conditional jump to be jle instead which would jump for equal comparisons too. Arguably one or the other may be preferred for clarity, or performance, but I don't have much of an answer as to which.

I also used dec esi instead of sub esi, 1 which is not very much better but just sits better with me. In 32-bit mode, dec esi is a single-byte instruction. Not so in 64-bit mode though; dec esi is 2 bytes against sub esi, 1 being 3 bytes.

Further, I changed the initial check for esi being zero from using cmp to test, which is slightly better, refer to Test whether a register is zero with CMP reg,0 vs OR reg,reg?

Finally, I changed the actual loop conditional to be at the end of the loop body, which means the loop uses one jump instruction less. The unconditional jump to the loop body's start is replaced by the conditional jump that checks for the while condition. The test at the beginning of the function is still needed to handle the zero-length-array possibility. Also, instead of using cmp or test again to check for zero in esi, I just use the Zero Flag as already set up by the dec instruction to check whether esi was decremented to zero.

You could use ecx or rcx for the loop counter, but that would probably not be much of a win on modern CPUs. Would allow for slightly more compact code if you resorted to using the jrcxz, jecxz, or loop instructions. They aren't recommended though, due to slower performance.

Instead of comparing to the dword [rdi] and then, if less-or-equal than eax, loading from the same memory dword, you could load the array entry's value into a register first and then use that as source for cmp and mov. This might be faster, but it does result in more opcode bytes.

A trick I otherwise use to advance the Destination Index (rdi in 64-bit mode) by 4 is to use a single scasd instruction, which modifies only the flags and the index register. This is a single-byte instruction instead of the 4-byte add rdi, 4 but is highly probably slower to run.

I uploaded a repo with your original source and my improvements to https://hg.ulukai.org/ecm/testsmal/file/2b8637ca416a/ (Taken under the CC BY-SA usage conditions of stackoverflow content.) I modified the C part and the test script too, but these are trivial and mostly irrelevant to your question. Here's the assembly source:


INT_MAX equ 7FFF_FFFFh

SECTION .text
global find_smallest_int

find_smallest_int:

                ; If the array is empty (size = 0) then we want to return
                ; without reading from the array at all. The value to return
                ; then logically should be the highest possible number for a
                ; 32-bit signed integer. This is called INT_MAX in the C
                ; header limits.h and for 32-bit int is equal to 7FFF_FFFFh.
                ;
                ; If the array is not empty, the first iteration will then
                ; always leave our result register equal to the value in
                ; the first array entry. This is either equal to INT_MAX
                ; again, or less than that.
        mov eax, INT_MAX

                ; esi is the second argument to our function, which is
                ; declared as int find_smallest_int(int *, unsigned int).
                ; It represents the length of the array. We use this
                ; as a counter. rsi (and its part esi) need not be preserved
                ; across function calls for the AMD64 psABI that is used by
                ; Linux, see https://stackoverflow.com/a/40348010/738287

                ; Check for an initial zero value in esi. If this is found,
                ; skip the loop without any iteration (while x do y) and
                ; return eax as initialised to INT_MAX at the start.
        test esi, esi
        jz .end

.loop:
                ; If eax is smaller than dword [rdi], we'll jump down to the
                ; rest of the loop. Only if eax is bigger than or equal to
                ; the dword [rdi] will we reassign eax to that, to hold the
                ; new smallest-yet value.
        cmp eax, dword [rdi]
        jl .postassign

.assign:
                ; If we execute this code, it means eax was not less
                ; than dword [rdi]. Therefore, we can safely reassign
                ; eax to dword [rdi].
        mov eax, dword [rdi]


.postassign:
                ; Set rdi to point to the next value in the array.
        add rdi, 4


                ; Subtract one from our counter. This started as 
                ; the number of elements in the array - when it
                ; gets to 0, we'll have looped through the entire thing.
        dec esi

                ; Check to see if we've reached the end of the array.
                ; To do this, we use the Zero Flag as set by the prior
                ; dec instruction. If esi has reached zero yet (ZR) then
                ; we do not continue looping. In that case, we return the
                ; smallest value found yet (which is in eax at the moment).
                ;
                ; Else, we jump to the start of the loop to begin the next
                ; iteration.
        jnz .loop

.end:
        retn

Here's an alternative for the conditional jump within the loop body. The cmov instruction seems to be supported by all AMD64 CPUs. It is a conditional move: If the condition is met it runs like a mov -- it has no effect otherwise, with one exception: It may read the source operand (and thus might fault from the memory access). You can flip the condition you used for the branch around the mov to get the condition for the cmov instruction. (I came across this thread involving Linus Torvalds which indicates that the conditional jump solution may actually be better or no worse than cmov. Make of that what you will.)

.loop:
                ; If eax is greater than or equal to dword [rdi], we'll
                ; reassign eax to that dword, the new smallest-yet value.
        cmp eax, dword [rdi]
        cmovge eax, dword [rdi]

                ; Set rdi to point to the next value in the array.
        add rdi, 4

这篇关于返回数组中最小整数的汇编代码,而不是随机返回最后一个或倒数第二个数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆