mov&跳到&跳回vs通话&退回 [英] mov & jmp to & jmp back vs call & ret

查看:90
本文介绍了mov&跳到&跳回vs通话&退回的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看一些汇编代码,我看到了:

I was going over some Assembly code and I saw this:

    mov r12, _read_loopr
    jmp _bzero
_read_loopr:
...
_bzero:
    inc r8
    mov byte [r8+r15], 0x0
    cmp r8, 0xff
    jle _bzero
    jmp r12

我想知道这样做是否有什么特别的优势(将_read_loopr移至该函数的寄存器jmp,然后再返回jmp),而不是通常的_bzero和ret调用?

And I was wondering if there was any particular advantage to doing this (mov _read_loopr to a register jmp to the function and then jmp back) rather than the usual call _bzero and ret?

推荐答案

这看上去就像是死脑代码,特别是如果返回地址标签始终位于jmp _bzero之后,就像您在评论中所说的那样.

This just looks like braindead code, especially if the return-address label is always right after the jmp _bzero like you say in your comment.

作者也许认为他们不能使用call因为函数调用了缓冲寄存器".如果要调用不属于同一代码库的函数,则必须根据调用约定假定这一点.但是您可以call/ret使用自定义调用约定的功能.

Maybe the author thought that they couldn't use call "because function calls clobber registers". This what you have to assume based on the calling convention if you're calling a function that isn't part of the same codebase. But you can call/ret to functions with custom calling conventions.

当然,对于这么小的代码,应该将其内联(即,使其成为宏而不是函数).

Of course, for code this small, it should have been inlined (i.e. make it a macro, not a function).

更重要的是,通常可以实现比一次存储一个字节更聪明的事情,如果有多个字节为零,则可能有可能导致分支预测错误.如果始终至少需要将8个字节(或更好的是16个字节)的数据清零,则可以使用宽存储来做到这一点.使最终存储写入要清零的缓冲区的最后一个字节,这可能与前一个存储重叠. (这比以分支机构结束来决定最终的4B商店,2B商店和1B商店要好得多.)请参见标签Wiki的问题,以获取有关编写高效asm的资源.

More importantly, something more clever than storing one byte at a time is normally possible, and probably worth a potential branch mispredict if there are more than a few bytes to zero. If at least 8 (or better, 16) bytes of data always need to be zeroed, you can do it with wide stores. Make the final store write the the last byte of the buffer to be zeroed, potentially overlapping with the previous store. (This is much better than ending with branches to decide to do a final 4B store, 2B store, and 1B store.) See the x86 tag wiki for resources about writing efficient asm.

如果返回地址不是在jmp _bzero 之后的其他位置,则最糟糕的情况可能是push _read_loopr/jmp _bzero_bzero中的ret.这会破坏返回地址预测变量堆栈,导致调用树的下一个〜15 ret发生错误的预测.

If the return address was somewhere other than right after the jmp _bzero, then the worst possible thing would probably be push _read_loopr / jmp _bzero, and ret in _bzero. That would break the return-address predictor stack, leading to a mispredict on the next ~15 rets up the call tree.

最好是内联循环,并在其后直接添加一个jmp.

Best would be to inline the loop and put a direct jmp after it.

我不确定将_bzero的地址传递到jmp的方式如何与call/retcall之后的jmp进行比较.

I'm not sure how passing an address for _bzero to jmp to would compare with a call/ret and then a jmp after the call.

call/ret相当便宜,但是在Intel上不是单uup指令.如果只有一个呼叫者,则jmp _bzero/jmp _read_loopr会更好.

call/ret are fairly cheap, but not single-uop instructions on Intel. A jmp _bzero / jmp _read_loopr would be better if there was only one caller.

这篇关于mov&跳到&跳回vs通话&退回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆