在Assembly问题中将大写转换为小写 [英] Converting Uppercase to Lowercase in Assembly issue
问题描述
我正在写将预设字符串从大写转换为小写.我目前正在将地址处的内容移到8位寄存器中,然后采用非常草率的方式测试ASCII值以查看其是否为大写.有没有更清洁的方法来解决这个问题?
I'm writing to convert a pre-set string from Uppercase to Lowercase. I'm currently moving what is at the address to an 8bit register, then doing a very sloppy way of testing the ASCII value to see if it's Uppercase. Is there a cleaner way to go about it?
现在,我从ASCII值中减去65,然后与25进行比较.由于大写字母是ASCII(十进制)65-90,因此任何大写字母都将导致0-25.
Right now I'm subtracting 65 from the ASCII value and comparing to 25. Since uppercase is ASCII (dec) 65-90, any uppercase letters will result in 0-25.
.DATA
string DB "ATest This String?.,/[}", '$'
strSize DD 23
.CODE
strToLower PROC
LEA EAX, string
PUSH EAX
CALL toLower2 ; write toLower2
POP EAX
LEA EAX, string ; return char* to C++
RET
strToLower ENDP
;---------------------------------------------
;Procedure: Convert to LowerCase
;Input: Address in EBX
; unsigned in AL for each letter
;Output: EAX will contain new string
;---------------------------------------------
toLower2 PROC ;65-90 is upper, 97-122 is lower (XOR 32?)
LEA EBX, string
MOVE ECX, strSize
PUSH AL ; PUSH AL before manipulating it
loop1: MOV AL, [EBX] ; Put char into AL to manipulate
XOR BL, BL ;?????????????
MOV BL, AL ;Set condition here???
SUB BL, 65 ;?????????????
CMP BL, 25 ;if(i > 64 && < 91) i += 32;
JA NoCap ;
ADD AL, 32 ;Adds 32 to ASCII value, making lower
NoCap: MOV [EBX], AL
INC EBX
LOOP loop1
POP AL ;Replace/POP AL
LEA EAX, string
toLower2 ENDP
END
推荐答案
SUB,然后使用无符号比较是一种仅使用一个条件分支(而不是单独进行比较和检查)来检查输入是否在一定范围内的好方法. >= 'A'
和<= 'Z'
的分支.
SUB and then an unsigned compare is a good way to check for a inputs being within a certain range using only one conditional branch, instead of separate compare-and-branches for >= 'A'
and <= 'Z'
.
编译器在可能的情况下使用此技巧.另请参见 Agner Fog的优化程序集指南,以及
Compilers use this trick when possible. See also Agner Fog's Optimizing Assembly guide, and other links in the x86 tag wiki for more stuff about writing efficient asm.
您甚至可以使用它通过一个分支来检测字母字符(小写或大写):或与0x20一起使用将使任何大写字母变为小写,但不会使任何非字母字符变为字母.这样做,然后使用unsigned-compare技巧检查是否在小写字母范围内. (或以~0x20
开头与,以清除该位,并强制使用大写字母).我在
You can even use it to detect alphabetic characters (lower or upper case) with one branch: OR with 0x20 will make any upper-case character lower-case, but won't make any non-alphabetic characters alphabetic. So do that, then use the unsigned-compare trick to check for being in the lower-case range. (Or start with AND with ~0x20
to clear that bit, forcing upper-case). I used this trick in an answer on flipping the case of alphabetic characters while leaving other characters alone.
是的,正如您所注意到的那样,ASCII是经过设计的,因此每个字母的大写/小写字母之间的区别只是翻转一位.每个小写字符都设置为0x20,大写字母已清除.通常,最好使用AND/OR/XOR(相对于ADD/SUB),因为在强制一种情况下,有时您可以利用不在乎初始状态的优势.
And yes, as you noticed, ASCII is designed so the difference between upper/lower case for every letter is just flipping one bit. Every lowercase character has 0x20 set, while uppercase has it cleared. AND/OR/XOR are typically preferable for doing this (vs. ADD/SUB), because you can sometimes take advantage of not caring about the initial state, when forcing to one case.
您的代码有一些奇怪的东西:PUSH AL
甚至没有与大多数汇编程序一起汇编,因为push/pop的最小大小为16位.保存/恢复AL也没有意义,因为在循环后恢复AL之后,您会破坏整个EAX!
Your code has some weird stuff: PUSH AL
doesn't even assemble with most assemblers, since the minimum size for push/pop is 16 bits. There's also no point to saving/restoring AL, because you clobber the whole of EAX right after restoring AL after the loop!
此外,MOV只会覆盖其目的地,因此无需xor bl,bl
.
Also, MOV just overwrites its destination, so there's no need to xor bl,bl
.
此外,您将BL用作暂存寄存器,但这是EBX的低字节(用作指针!)
Also, you use BL as a scratch register, but it's the low byte of EBX (which you use as a pointer!)
这是我可能的方法,仅使用EAX,ECX和EDX,因此我不必保存/恢复任何寄存器. (您的函数破坏了EBX,大多数32位和64位调用约定都需要函数来保存/恢复).如果string
不是静态分配的,我需要一个额外的寄存器,让我使用其地址作为立即数.
Here's how I might do it, using only EAX, ECX and EDX so I don't have to save/restore any registers. (Your function clobbers EBX, which most 32 and 64-bit calling conventions require functions to save/restore). I'd need an extra register if string
wasn't statically allocated, letting me use its address as an immediate constant.
toLower2 PROC ;65-90 is upper, 97-122 is lower (XOR 32?)
mov edx, OFFSET string ; don't need LEA for this, and mov is slightly more efficient
add edx, strSize ; This should really be an equ definition, not a load from memory.
; edx starts at one-past-the-end, and we loop back to the start
loop1:
dec edx
movzx eax, byte [edx] ; mov al, [edx] leaving high garbage in EAX is ok, too, but this avoids a partial-register stall when doing the mov+sub in one instruction with LEA
lea ecx, [eax - 'A'] ; cl = al-'A', and we don't care about the rest of the register
cmp cl, 25 ;if(c >= 'A' && c <= 'Z') c |= 0x20;
ja NoCap
or al, 0x20 ; tolower
mov [edx], al ; since we're branching anyway, make the store conditional
NoCap:
cmp edx, OFFSET string
ja loop1
mov eax, edx
toLower2 ENDP
LOOP指令很慢,应避免使用.只需忘记它甚至存在,并使用任何方便的循环条件即可.
The LOOP instruction is slow, and should be avoided. Just forget it even exists and use whatever loop condition is convenient.
仅在字符更改时执行存储操作才能使代码更有效,因为当在无事可做的内存上使用了一段时间后,它不会弄脏缓存.
Only doing the store when the character changes makes the code more efficient, because it won't dirty the cache when used on memory that hasn't changed for a while if there's nothing to do.
您可以使用cmov进行无分支操作,而不是ja NoCap
.但是现在我不得不忽略我的建议,选择AND/OR而不是ADD/SUB,因为我们可以使用LEA在不影响标志的情况下添加0x20,从而为我们节省了寄存器.
Instead of ja NoCap
, you could do that branchlessly with a cmov. But now I have to ignore my suggestion to prefer AND/OR instead of ADD/SUB, because we can use LEA to add 0x20 without affecting flags, saving us a register.
loop1:
dec edx
movzx eax, byte [edx] ; mov al, [edx] leaving high garbage in EAX is ok, too, but this avoids a partial-register stall when doing the mov+sub in one instruction with LEA
lea ecx, [eax - 'A'] ; cl = al-'A', and we don't care about the rest of the register
cmp cl, 25 ;if(c >= 'A' && c <= 'Z') c += 0x20;
lea ecx, [eax + 0x20] ; without affecting flags
cmovna eax, ecx ; take the +0x20 version if it was in the uppercase range to start with
; al = tolower(al)
mov [edx], al
cmp edx, OFFSET string
ja loop1
这篇关于在Assembly问题中将大写转换为小写的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!