我如何比较汇编中的两个字符串(nasm) [英] How do I compare two strings in assembly (nasm)

查看:96
本文介绍了我如何比较汇编中的两个字符串(nasm)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

修改后的代码:有没有办法优化这个?

REVISED CODE: Is there a way to optimize this?

read_pass:
    ;read passowrd

    ; read(int fd, void *buf, size_t count);
    ; #define __NR_read 0
    ; rdi = unsigned int fd
    ; rsi = char *buf
    ; rdx = size_t count
    xor rax, rax
    mov rdi, rax
    mov rsi, userpass
    mov rdx, rax
    add rdx, 0x64 ; 100 
    syscall

    lea rdi, [passcode]
    lea rsi, [userpass]
    mov rcx, pclen

    repe cmpsb 
    je do_something

    jmp read_pass

section .data
    passcode db 'hi', 0xa  
    pclen equ $ - passcode 
    userpass times 100 db 0
    uplen equ $ - userpass

<小时>

评论回复的原始问题是询问类似的代码,但对 cmps 使用了不正确的操作数:


ORIGINAL Q that the comments are replying to was asking about similar code, but using incorrect operands to cmps:

如何比较汇编中的两个字符串 (nasm)?

How do I compare two strings in assembly (nasm)?

编译时出现以下错误:
Pass.nasm:129: 错误:操作码和操作数的组合无效

第 129 行是:
cmpsq 用户密码,密码

(我也试过 cmp 和 cmps)

( I also tried cmp and cmps)

推荐答案

由于这是一个读取/检查密码功能,优化重复调用的速度没有意义.优化代码大小(并且在第一次运行时没有任何主要停顿)是要尽量减少缓存污染(尤其是小而非常有价值的 uop 缓存)的方法.请参阅 http://agner.org/optimize/(以及从 https://stackoverflow.com/tags/x86/info) 以获得出色的信息.

Since this is a read/check password function, optimizing for speed on repeated calls makes no sense. Optimizing for code size (and lack of any major stalls on the first run) is the way to go, to minimize cache pollution (esp. the small and very valuable uop-cache). See http://agner.org/optimize/ (and some of the other resources linked from https://stackoverflow.com/tags/x86/info) for excellent info.

我确实在您的代码中发现了一些错误/安全漏洞,以及节省字节的方法.此外,将读取缓冲区保留在堆栈上将节省 100 字节的 BSS 空间.见下文.

I did find some bugs / security holes in your code, and ways to save bytes. Also, keeping your read buffer on the stack would save 100 bytes of BSS space. See below.

看起来你想从 stdin (fd 0) 硬编码 read(2) 到一个长度为 100 的缓冲区中.如果你实际上只读取了 99 个字符,你的字符串仍然是以零结尾,所以我建议这样做.

It looks like you want to hard-code read(2) from stdin (fd 0), into a buffer of length 100. If you actually read only 99 characters, your string will still be zero-terminated, so I'd suggest doing that.

将全局变量/数组的地址加载到 AMD64 寄存器中最好使用 mov r32, imm32, 根据 gcc/clang/icc.如果您不知道地址适合虚拟内存的低 32 位,或者您需要制作与位置无关的代码,那么 RIP 相关的 lea 是最佳选择.在 Linux x86_64 编程模型中,数据段地址位于 low32 中,因此 5 字节的 mov r32, imm32 有效.mov r64, imm32 sign - 扩展 32 位值.我们不希望那样,它需要一个 REX 前缀字节,因此将已知的 32 位地址加载到 32 位寄存器中实际上更好(但更容易阅读).显然,如果您这样做,任意地址将被截断.如果不确定,请使用 lea r64, [rel addr],当然,在将地址用作函数参数或 w/e 时,始终使用 64 位操作数大小.

Loading addresses of global variables / arrays into registers in AMD64 is best done with mov r32, imm32, according to gcc/clang/icc. A RIP-relative lea is the best choice if you don't know that the address fits in the low32 bits of virtual memory, or if you need to make position-independent code. Data-section addresses are in the low32 in the Linux x86_64 programming model, so the 5-byte mov r32, imm32 works. mov r64, imm32 sign-extends the 32bit value. We don't want that, and it takes a REX prefix byte, so it's actually better (but more confusing to read) to load known-32bit addresses into 32bit registers. Obviously arbitrary addresses will be truncated if you do this. If unsure, use lea r64, [rel addr], and of course always use 64bit operand-size when working with addresses as function arguments or w/e.

如果你确实需要处理全局变量的 64 位地址,它可能值得只加载一次,然后在系统调用中保存/恢复它(在另一个寄存器中它不会破坏,或者实际上推/弹出,因为我认为系统调用会破坏所有调用者保存的寄存器.即,如果我们使用 rbx,我们必须在函数的开始/结束处推送/弹出调用者的 rbx,因为它是一个 被调用者保存的注册.

If you do need to deal with 64bit addresses for globals, its maybe worth just loading it once, then saving/restoring it across the system call (in another a register it won't clobber, or actually push/pop since I think system calls clobber all the caller-saved registers. i.e. if we used rbx, we'd have to push/pop the caller's rbx at the start/end of our function, because it's a callee-saved register.

    xor eax, eax                ;  writing a 32bit reg always zeros the upper32, and saves a REX prefix byte
    xor edi, edi                ; read(fd 0)
    mov esi, userpass           ; lea rsi, [rel userpass]
    lea edx, [rax + uplen - 1]  ;  shorter and harder for humans to read than mov edx, uplen - 1
    syscall

    ; continued below

section .rodata
    ; passcode can be part of the shared read-only mapping of the executable, not copy-on-write.
    passcode db 'hi', 0xa    ; it's not normal to include the newline in the password, but it does make the code simpler I guess
    pclen equ $ - passcode

section .data
    userpass times 100 db 0
    uplen equ $ - userpass

从清零寄存器移出清零也是一个 2 字节指令,就像 xor.它可能在 AMD CPU 上略有优势,因为它可以在更多执行端口上运行.在英特尔上,Sandybridge 在寄存器重命名阶段处理异或相同,完全不使用执行单元,并为其提供每个时钟 4 的吞吐量.IDK,如果 AMD 会学会这个技巧.直到IvyBridge mov reg,reg 也在管道的寄存器重命名阶段处理,并且也不需要执行单元.两种方式可能都没有可测量的差异,因为它位于短依赖链的开头,所以我更喜欢异或归零只是为了使其更易于阅读(即,您不必记住在查看时 eax 已归零xor edi,edi.)

Zeroing by moving from a zeroed register is also a 2-byte instruction, like xor. It might have a slight advantage on AMD CPUs, where it can run on more execution ports. On Intel, Sandybridge handles xor same,same in the register-rename stage, not using an execution unit at all, and giving it a throughput of 4 per clock. IDK if AMD will ever pick up this trick. It's not until IvyBridge that mov reg,reg is also handled at the register-rename stage of the pipeline, and also doesn't need an execution unit. Probably not measurable difference either way, since it's at the start of a short dependency chain, so I'd prefer the xor-zeroing just to make it easier to read (i.e. you don't have to remember that eax was zeroed when looking at xor edi,edi.)

要将缓冲区长度存入寄存器,从技术上讲,它可能是将 reg 置零然后 add reg,imm8 的较短代码,但这是 2 个 Intel uops/AMD 宏操作,而不仅仅是一个用于 mov reg, imm32 ,它只长一个字节.(感谢在编写 32 位 reg 时对 upper32 自动归零.)实际上,节省 2 个字节的好方法是 lea edx, [rax + uplen - 1],其中 rax 是您刚刚清零的 reg.lea 带有一个有符号的 8 位位移只需要 3 个字节来编码.在长模式下,默认操作数大小为 32 位,默认地址大小为 64 位,这就是为什么 32 位 dest 寄存器和使用 64 位寄存器的寻址模式最紧凑的原因.有时查看 objdump -d/bin/ls 或其他东西是检查某种指令编码需要多少字节的最快方法,如果您知道使指令成为您的指令的规则是什么想要与您可以通过类似方式使用其他寄存器找到的长度相同的长度.

To get the buffer length into a register, it might technically be shorter code to zero a reg and then add reg,imm8, but that's 2 Intel uops / AMD macro-ops, instead of just one for a mov reg, imm32 which is only one byte longer. (Thanks to automatic zeroing of the upper32 when writing a 32bit reg.) Actually, a decent way to save 2 bytes would be lea edx, [rax + uplen - 1], where rax is a reg that you just zeroed. lea with a signed-8bit displacement only takes 3 bytes to encode. In long mode, the default operand size is 32bit and the default address size is 64bit, which is why 32bit dest register and an addressing mode using 64bit register(s) is the most compact. Sometimes looking at objdump -d /bin/ls or something is the quickest way to check how many bytes it takes for a certain kind of instruction encoding, if you know what the rules are that make the instruction you want the same length as something you can find using other registers in a similar way.

现在让我们看看您的实际密码检查代码.首先,只存储密码的哈希值,而不是明文密码本身是正常的.任何考虑将此代码实际用于任何非玩具用途的人都应该停止阅读并查找它.重复使用经过良好测试的库的次数越多,忽视安全漏洞的风险就越小.

Now let's look at your actual password-check code. First of all, it's normal to only store hashes of passwords, not the plaintext passwords themselves. Anyone considering actually using this code for any non-toy use should stop reading and go look that up. The more you can re-use well-tested libraries, the less risk of overlooking a security hole.

; continuing from above:
; ... syscall

test eax, eax       ; read(2) result in eax
jle  EOF_or_error   ; In C, most of the code in systems programming is checking for errors.

; lea rdi, [passcode]
; lea rsi, [userpass]
; If you use lea, make sure you use RIP-rel, because 64bit absolute addressing is only available for mov rax, [addr64].

mov edi, passcode   ; 5 bytes, see above discussion of loading addresses.
lea rsi, [rdi + userpass - passcode]  ; This is only 4 bytes.  3 bytes if dest is esi, not rsi. (no REX needed).

mov ecx, pclen   ; we know pclen < uplen, so this can't buffer overflow, but see text for security problems from not looking at length of read
repe cmpsb 
jne read_pass
; fall through to do_something, or to a ret insn.  Saves a jmp

现在看起来很合理.在密码中包含换行符可以让您不必检查您阅读的密码的长度.您确实需要检查您是否阅读了某些内容,否则您可能只是将密码与之前的正确输入进行比较,如果 read 没有触及缓冲区中的任何字节.

This looks pretty reasonable now. Including the newline in the password is what lets you get away with not even checking the length of the password you read. You do need to check that you read something, or else you might just be comparing the password against the previous correct input, if read didn't touch any bytes in your buffer.

实际上,当 tty 处于行缓冲cooked"输入模式时,read(2) 会在您按下 ctrl-d (EOF) 时返回到目前为止输入的内容,即使它没有t 包括换行符.随后的 read 调用将读取更多信息.因此,您需要担心这一点,以及中断的系统调用(例如通过信号).这是库 I/O 函数为您处理的事情之一.

Actually, with the tty in line-buffered "cooked" input mode, read(2) returns what's been typed so far when you press ctrl-d (EOF), even when it doesn't include a newline. A subsequent call to read will read more. So you need to worry about that, as well as interrupted system calls (e.g. by a signal). This is one of the things library I/O functions handle for you.

尝试使用 cat:您可以输入一些字符,然后按 ctrl-d 使它们在没有换行符的情况下回显.所以这个密码例程有一个巨大的安全漏洞:如果以前正确的密码在缓冲区中,我所要做的就是猜测第一个字符.我可以重复猜测,只需在每个字符后按 ctrl-d 即可.

Try with cat: you can type some characters, and then have them echoed without a newline by pressing ctrl-d. So this password routine has a huge security hole: If the previous correct password is sitting there in the buffer, all I have to do is guess the first character. I can make repeated guesses, just pressing ctrl-d after every character.

如果您将缓冲区归零(使用 mov eax, ecx/xor eax,eax/rep stosb,其中 eax 是您检查的读取返回值,则可以避免此问题是>= 0).一旦检查过旧密码条目,它就会从内存中擦除旧密码条目.当然,正确的密码只是以纯文本形式存在.如果您不关心内存中的密码,您可以根据正确密码的长度检查读取的字符数.

You'd avoid this problem if you zeroed the buffer (with mov eax, ecx / xor eax,eax / rep stosb, where eax is the read return value that you've checked is >= 0). That wipes the old password entry from memory as soon as its been checked. Of course, the correct password is just sitting there in plain text. If you don't care about passwords sitting around in memory, you could just check the number of characters read against length of the correct password.

; not shown: check for EOF/error

mov ecx, pclen
cmp ecx, eax    ; check lengths to avoid EOF first-char guessing
jne read_pass

; not shown: set up addresses

repe cmpsb      ; check contents
jne read_pass

; They match, do whatever here.

我没有看到只使用一个 test 或 cmp 指令来检查零/负返回值并检查该值的聪明方法

I don't see a clever way to only use one test or cmp instruction to check for zero / negative return value and check that

另一点:密码输入缓冲区可能在堆栈上.如果此代码不必在 Windows 上不变地运行,您甚至不必使用 RSP 进行测试,您只需使用当前堆栈指针下方的红色区域,信号处理程序将t 破坏者.这样您就不会为仅在密码输入期间使用的缓冲区永久浪费 100 字节.既然我已经证明你应该检查 read 的返回值,那么旧的内容并不重要,无论你是在堆栈上还是从 malloc 中.

Another point: the password input buffer could be on the stack. If this code doesn't have to run unchanged on Windows, you don't even have to futz with RSP, you can just use the red-zone below the current stack pointer, which signal handlers won't clobber. Then you aren't wasting 100 bytes permanently for a buffer that's only used during password input. Since I've shown you should really be checking the return value of read anyway, the old contents don't matter, whether you have it on the stack or from malloc.

对于重复strcmp 调用的速度rep cmpsb 的启动开销可能使其比短字符串的正常循环更糟糕.对于 memset/memcpy,我认为 rep stos/rep movs 比优化的 SSE 循环快的阈值大约为 128B 左右,在具有快速字符串操作的 Intel CPU 上(IvB 及更高版本).

For speed on repeated strcmp calls, the startup overhead of rep cmpsb may make it worse than a normal loop for short strings. For memset/memcpy, I think the threshold where rep stos / rep movs are faster than an optimized SSE loop is about 128B or so, on Intel CPUs with Fast String Operations (IvB and later).

这篇关于我如何比较汇编中的两个字符串(nasm)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆