在Assembly问题中将大写转换为小写 [英] Converting Uppercase to Lowercase in Assembly issue

查看:123
本文介绍了在Assembly问题中将大写转换为小写的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写将预设字符串从大写转换为小写.我目前正在将地址处的内容移到8位寄存器中,然后采用非常草率的方式测试ASCII值以查看其是否为大写.有没有更清洁的方法来解决这个问题?

I'm writing to convert a pre-set string from Uppercase to Lowercase. I'm currently moving what is at the address to an 8bit register, then doing a very sloppy way of testing the ASCII value to see if it's Uppercase. Is there a cleaner way to go about it?

现在,我从ASCII值中减去65,然后与25进行比较.由于大写字母是ASCII(十进制)65-90,因此任何大写字母都将导致0-25.

Right now I'm subtracting 65 from the ASCII value and comparing to 25. Since uppercase is ASCII (dec) 65-90, any uppercase letters will result in 0-25.

    .DATA
string  DB   "ATest This String?.,/[}", '$'
strSize DD  23
.CODE
strToLower  PROC
        LEA     EAX, string
        PUSH    EAX
        CALL    toLower2    ; write toLower2
        POP EAX
        LEA EAX, string     ; return char* to C++
        RET
strToLower  ENDP

;---------------------------------------------
;Procedure: Convert to LowerCase
;Input: Address in EBX
;       unsigned in AL for each letter
;Output: EAX will contain new string
;---------------------------------------------

toLower2    PROC    ;65-90 is upper, 97-122 is lower (XOR 32?)
            LEA EBX, string
            MOVE ECX, strSize
            PUSH AL     ; PUSH AL before manipulating it
loop1:      MOV AL, [EBX]   ; Put char into AL to manipulate
            XOR BL, BL          ;?????????????
            MOV BL, AL          ;Set condition here???
            SUB BL, 65          ;?????????????
            CMP BL, 25          ;if(i > 64 && < 91) i += 32;
            JA  NoCap           ;
            ADD AL, 32          ;Adds 32 to ASCII value, making lower 
NoCap:      MOV [EBX], AL
            INC EBX
            LOOP loop1
            POP AL      ;Replace/POP AL
            LEA EAX, string
toLower2    ENDP
            END

推荐答案

SUB,然后使用无符号比较是一种仅使用一个条件分支(而不是单独进行比较和检查)来检查输入是否在一定范围内的好方法. >= 'A'<= 'Z'的分支.

SUB and then an unsigned compare is a good way to check for a inputs being within a certain range using only one conditional branch, instead of separate compare-and-branches for >= 'A' and <= 'Z'.

编译器在可能的情况下使用此技巧.另请参见 Agner Fog的优化程序集指南,以及

You can even use it to detect alphabetic characters (lower or upper case) with one branch: OR with 0x20 will make any upper-case character lower-case, but won't make any non-alphabetic characters alphabetic. So do that, then use the unsigned-compare trick to check for being in the lower-case range. (Or start with AND with ~0x20 to clear that bit, forcing upper-case). I used this trick in an answer on flipping the case of alphabetic characters while leaving other characters alone.

是的,正如您所注意到的那样,ASCII是经过设计的,因此每个字母的大写/小写字母之间的区别只是翻转一位.每个小写字符都设置为0x20,大写字母已清除.通常,最好使用AND/OR/XOR(相对于ADD/SUB),因为在强制一种情况下,有时您可以利用不在乎初始状态的优势.

And yes, as you noticed, ASCII is designed so the difference between upper/lower case for every letter is just flipping one bit. Every lowercase character has 0x20 set, while uppercase has it cleared. AND/OR/XOR are typically preferable for doing this (vs. ADD/SUB), because you can sometimes take advantage of not caring about the initial state, when forcing to one case.

您的代码有一些奇怪的东西:PUSH AL甚至没有与大多数汇编程序一起汇编,因为push/pop的最小大小为16位.保存/恢复AL也没有意义,因为在循环后恢复AL之后,您会破坏整个EAX!

Your code has some weird stuff: PUSH AL doesn't even assemble with most assemblers, since the minimum size for push/pop is 16 bits. There's also no point to saving/restoring AL, because you clobber the whole of EAX right after restoring AL after the loop!

此外,MOV只会覆盖其目的地,因此无需xor bl,bl.

Also, MOV just overwrites its destination, so there's no need to xor bl,bl.

此外,您将BL用作暂存寄存器,但这是EBX的低字节(用作指针!)

Also, you use BL as a scratch register, but it's the low byte of EBX (which you use as a pointer!)

这是我可能的方法,仅使用EAX,ECX和EDX,因此我不必保存/恢复任何寄存器. (您的函数破坏了EBX,大多数32位和64位调用约定都需要函数来保存/恢复).如果string不是静态分配的,我需要一个额外的寄存器,让我使用其地址作为立即数.

Here's how I might do it, using only EAX, ECX and EDX so I don't have to save/restore any registers. (Your function clobbers EBX, which most 32 and 64-bit calling conventions require functions to save/restore). I'd need an extra register if string wasn't statically allocated, letting me use its address as an immediate constant.

toLower2    PROC    ;65-90 is upper, 97-122 is lower (XOR 32?)
            mov   edx, OFFSET string   ; don't need LEA for this, and mov is slightly more efficient
            add   edx, strSize         ; This should really be an equ definition, not a load from memory.

            ; edx starts at one-past-the-end, and we loop back to the start
loop1:
            dec   edx
            movzx eax, byte [edx]      ; mov al, [edx] leaving high garbage in EAX is ok, too, but this avoids a partial-register stall when doing the mov+sub in one instruction with LEA
            lea   ecx, [eax - 'A']     ; cl = al-'A', and we don't care about the rest of the register

            cmp    cl, 25              ;if(c >= 'A' && c <= 'Z') c |= 0x20;
            ja    NoCap
            or     al, 0x20            ; tolower
            mov   [edx], al            ; since we're branching anyway, make the store conditional
NoCap:
            cmp   edx, OFFSET string
            ja    loop1

            mov   eax, edx             
toLower2    ENDP

LOOP指令很慢,应避免使用.只需忘记它甚至存在,并使用任何方便的循环条件即可.

The LOOP instruction is slow, and should be avoided. Just forget it even exists and use whatever loop condition is convenient.

仅在字符更改时执行存储操作才能使代码更有效,因为当在无事可做的内存上使用了一段时间后,它不会弄脏缓存.

Only doing the store when the character changes makes the code more efficient, because it won't dirty the cache when used on memory that hasn't changed for a while if there's nothing to do.

您可以使用cmov进行无分支操作,而不是ja NoCap.但是现在我不得不忽略我的建议,选择AND/OR而不是ADD/SUB,因为我们可以使用LEA在不影响标志的情况下添加0x20,从而为我们节省了寄存器.

Instead of ja NoCap, you could do that branchlessly with a cmov. But now I have to ignore my suggestion to prefer AND/OR instead of ADD/SUB, because we can use LEA to add 0x20 without affecting flags, saving us a register.

loop1:
            dec   edx
            movzx eax, byte [edx]      ; mov al, [edx] leaving high garbage in EAX is ok, too, but this avoids a partial-register stall when doing the mov+sub in one instruction with LEA
            lea   ecx, [eax - 'A']     ; cl = al-'A', and we don't care about the rest of the register

            cmp    cl, 25              ;if(c >= 'A' && c <= 'Z') c += 0x20;
            lea   ecx, [eax + 0x20]    ; without affecting flags
            cmovna eax, ecx            ; take the +0x20 version if it was in the uppercase range to start with
            ; al = tolower(al)

            mov   [edx], al
            cmp   edx, OFFSET string
            ja    loop1

这篇关于在Assembly问题中将大写转换为小写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆