如何通过汇编中的字符串索引 [英] How to index through a string in assembly

查看:136
本文介绍了如何通过汇编中的字符串索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出变量:

var1    db  "abcdefg", NULL

我将如何执行循环以导航每个字母?在C ++中,您可以在循环内执行类似var [x]的操作,然后每次递增x.有什么想法吗?

How would I perform a loop to navigate each letter? In C++ you would do something like var[x] inside the loop, then increment x each time. Any ideas?

推荐答案

在C和C ++中,字符串以NUL终止.这意味着将ASCII NUL字符(0)添加到字符串的末尾,以便代码可以知道字符串的末尾. strlen函数从头开始遍历字符串,并不断循环直到遇到此NUL字符.当它找到NUL时,便知道这是字符串的末尾,并将从开始到NUL的字符数作为字符串的长度返回.

In C and C++, strings are NUL terminated. This means that an ASCII NUL character (0) is added to the end of the string so that code can tell where the string ends. The strlen function walks through the string, starting from the beginning, and keeps looping until it encounters this NUL character. When it finds the NUL, it knows that's the end of the string, and it returns the number of characters from the beginning to the NUL as the string's length.

字符串文字(双引号中的内容)由C/C ++编译器自动NUL终止,因此:

String literals (the things in double-quotation marks) are automatically NUL-terminated by a C/C++ compiler, so that:

"abcdefg"

等效于以下数组:

{'a', 'b', 'c', 'd', 'e', 'f', 'g', 0}

我之所以这样提及,是因为彼得·拉德(Peter Rader)在他的回答中提出了建议,而您并不真正理解他在说什么.但是,似乎您已经知道了这一点,因为您在程序集声明中将NUL字符附加到了字符串中:

I mention this because Peter Rader suggested it in his answer, and you didn't really understand what he was talking about. However, it seems that you already know this, as you appended a NUL character to your string in the assembly declaration:

var1    db  "abcdefg", NULL

现在,通常来说,我们不使用标识符NULL.尤其是在C中,NULL被定义为空指针.我们只使用文字0,所以定义为:

Now, generally, we don't use the identifier NULL for this. Especially not in C, where NULL is defined as a null pointer. We just use the literal 0, so that definition would be:

var1    db  "abcdefg", 0

,但假设NULL定义为0的地方,您的代码可能会起作用.

but your code probably works, assuming that NULL is somewhere defined as 0.

因此,您已正确完成所有设置.现在,您需要做的就是编写循环:

So you've got the setup all correct. Now all you need to do is write your loop:

    mov  edx, OFFSET var1    ; get starting address of string

Loop:
    mov  al, BYTE PTR [edx]  ; get next character
    inc  edx                 ; increment pointer
    test al, al              ; test value in AL and set flags
    jz   Finished            ; AL == 0, so exit the loop

    ; Otherwise, AL != 0, so we fell through.
    ; Here, you can do do something with the character in AL.
    ; ...

    jmp  Loop                ; keep looping

Finished:

您说您熟悉CMP指令.在上面的代码中,我使用了TEST而不是CMP.您可以等效地写:

You say that you're familiar with the CMP instruction. In the code above, I used TEST rather than CMP. You could have equivalently written:

cmp  al, 0

但是

test al, al

效率更高,因为它是一条较小的指令,所以在特殊情况下(我将寄存器的值与0进行比较),我习惯以这种方式编写它.编译器也将生成此代码,因此,很高兴熟悉它.

is slightly more efficient because it is a smaller instruction, so I'm just in the habit of writing it that way in the special case that I'm comparing a register's value to 0. Compilers will generate this code, too, so it's good to be familiar with it.

奖励聊天:表示字符串的另一种方法是将其长度(以字符为单位)与字符串本身一起存储.这是Pascal语言传统上所做的.这样,您不需要在字符串末尾使用特殊的NUL标记字符.相反,声明看起来像这样:

Bonus chatter: An alternative way of representing a string is to store its length (in characters) along with the string itself. This is what the Pascal language traditionally did. This way, you don't need the special NUL sentinel character at the end of the string. Rather, the declaration would look like this:

var1    db  7, "abcdefg"

其中每个字符串的第一个字节是其长度. 与C样式相比,它具有多种优势,即您不必遍历整个字符串来确定其长度.当然,主要的缺点是字符串的长度限制为255个字符,因为这足以容纳BYTE.

where the first byte of every string is its length. This has various advantages over the C style, namely that you don't have to iterate through the entire string to determine its length. The primary disadvantage, of course, is that a string's length is limited to 255 characters, since that's all that will fit into a BYTE.

无论如何,使用预先知道的长度,您不再需要检查NUL字符,而只需重复与字符串中的字符相同的次数即可:

Anyway, with the length known in advance, you're no longer checking for a NUL character, you're just iterating the same number of times as the characters in the string:

    mov  edx, OFFSET var1    ; get starting address of string
    mov  cl, BYTE PTR [edx]  ; get length of string

Loop:
    inc  edx                 ; increment pointer
    dec  cl                  ; decrement length
    mov  al, BYTE PTR [edx]  ; get next character
    jz   Finished            ; CL == 0, so exit the loop

    ; Do something with the character in AL.
    ; ...

    jmp  Loop                ; keep looping

Finished:

(在上面的代码中,我假设所有字符串的长度都是 minimum ,长度为1个字符.这可能是一个安全的假设,并且避免了在循环.)

(In the code above, I've assumed that all strings are a minimum of 1 character in length. This is probably a safe assumption, and avoids the need to do a length check above the loop.)

或者,您也可以执行您提到的数组索引,但是如果要通过字符串迭代 forwards ,则必须格外小心:

Alternatively, you could do the array-indexing that you mentioned, but you have to be a bit careful if you want to iterate forwards through the string:

    mov   edx, OFFSET var1        ; get starting address of string
    movzx ecx, BYTE PTR [edx]     ; get length of string
    lea   edx, [ecx+1]            ; increment pointer by 1 + number of chars
    neg   ecx                     ; negate the length counter
Loop:
    mov   al, BYTE PTR [edx+ecx]  ; get next character

    ; Do something with the character in AL.
    ; ...

    inc   ecx
    jnz   Loop                     ; CL != 0, so keep looping

基本上,我们将EDX设置为指向字符串的 end ,将计数器(ECX)设置为字符串长度的 negative .字符串,然后我们通过索引[EDX+ECX]来读取字符(由于我们取消了ECX,所以它等效于[EDX-ECX]).

Basically, we set EDX to point to the end of the string, we set the counter (ECX) to the negative of the length of the string, and then we read characters by indexing [EDX+ECX] (which, since we negated ECX, is equivalent to [EDX-ECX]).

几乎可以肯定,有一种比我在这里想出的更好(更聪明)的方法,但是您应该明白这一点.

There is almost certainly a better (more clever) way of doing this than I've managed to think up here, but you should get the idea.

这篇关于如何通过汇编中的字符串索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆