通过字符串汇编循环来计算字符 [英] Assembly loop through a string to count characters

查看:230
本文介绍了通过字符串汇编循环来计算字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试创建一个汇编代码来计算字符串中有多少个字符,但是我得到一个错误。

代码,我使用gcc和intel_syntax

p>

  #include< stdio.h> 

int main(){
char * s =aqr b qabxx xryc pqr;
int x;

asm volatile(
.intel_syntax noprefix;
mov eax,%1;
xor ebx,ebx;
循环:
mov al,[eax];
或al,al;
jz print;
inc ebx;
jmp循环
print:
mov%0,ebx;
.att_syntax前缀;
:= r(x)
: r(s)
:eax,ebx
);

printf(字符串长度:%d \ n,x);
返回0;

}

我得到错误:

 错误:无效使用注册

最后,我想制作程序,它搜索正则表达式模式([pq] [^ a] + a)并打印它的开始位置和长度。我用C编写了它,但我必须使它在汇编中工作:
我的C代码:

  #include < stdio.h中> 
#include< string.h>

int main(){
char * s =aqr b qabxx xryc pqr;
int y,i;
int x = -1,length = 0,pos = 0;

int len = strlen(s);如果((s [i] =='p'|| s [i] =='q')&& length< b $ b $ for b(i = 0; i< len; i ++) ; = 0){
pos = i;
长度++;
继续;
} else if((s [i]!='a'))&& pos> 0){
length ++;
} else if((s [i] =='a')&& pos> 0){
length ++;
if(y y = length;
长度= 0;
x = pos;
pos = 0;
}
else
length = 0;
pos = 0;



printf(position:%d,length:%d,x,y);
返回0;


code $ <$ $ p

解决方案

jmp loop print:

之后的分号




你的asm也不会正常工作。你将指针移动到 s 到eax中,然后用 mov al,[eax] 覆盖它。因此,通过循环的下一个传递,eax不再指向字符串了。



当你解决这个问题时,你需要考虑每个传递通过的事实循环需要将eax改为指向下一个字符,否则 mov al,[eax] 会一直读取相同的字符。






由于您尚未接受答案(通过点击左边的复选标记),仍然有时间再进行一次编辑。



通常我不会做人的功课,但已经过了几天。推测该作业的到期日期已过。在这种情况下,以下是一些解决方案,无论是对于OP的教育还是对于未来的SO用户:

1)遵循(有点奇怪的)赋值:

pre $ asm volatile(
.intel_syntax noprefix;
mov eax,%1;
xor ebx,ebx;
cmp byte ptr [eax],0;
jz print;
loop:
inc ebx;
inc eax;
cmp byte ptr [eax],0;
jnz loop;
print:
mov %0,ebx;
.att_syntax前缀;
:= r(x)
:r(s)
:eax,ebx
);

2)违反了一些分配规则,使代码稍微好一点:

  asm(
\ n.intel_syntax noprefix \\\
'\\t
mov eax,%1 \\\
\ t
xor%0,%0\\\
\t
cmp byte ptr [eax],0\\\
\t
jz print \\ n
循环:\\\
\t
inc%0 \\\
\t
inc eax\\\
\t
cmp byte ptr [eax],0\\\
\t
jnz loop \\\

print:\\\

.att_syntax prefix
:= r(x)
:r(s)
:eax,cc,memory
);

使用1个寄存器(不存在 ebx )并省略(不必要的) volatile 限定符。它还添加了ccclobber来指示代码修改标志,并使用内存clobber来确保任何'pending'写入到 s 被刷新为内存在执行asm之前。它还使用格式化(\\\
\t),以便使用 -S 构建的输出可读。



3)使用更少寄存器的高级版本(无 eax ),检查以确保 s 不为NULL(返回-1),使用符号名称并假定 -masm = intel ,这会产生更具可读性的代码:

<$ p $ __ asm__(
test%[string],%[string] \\\
\t
jz print'\

loop: \\\
\t
inc%[length] \\\
\t
cmp byte ptr [%[string] +%[length]],0 \\\\ t
jnz loop \\\

print:
:[length]= r(x)
:[string]r(s) ,[length](-1)
:cc,memory
);

摆脱(任意的,没有经过深思熟虑的)赋值约束,我们可以将它减少到7 (如果我们不检查NULL,则返回5;如果我们不计算标签[其实不是指令],则返回3)。

有几种方法可以提高这更进一步(使用%= 标签来避免可能的重复符号问题,使用本地标签( .L )) ,甚至为它写 -masm = intel -masm = att ,等等),但我敢说,这3个中的任何一个都比原始问题中的代码好。






库巴呢,我不确定你在接受答案之前还有什么更多。不过,它给了我一个包括彼得版本的机会。



4)指针增量:

  __ asm__(
cmp byte ptr [%[string]],0\\\
\t
jz .Lprint%= \\\

.Loop%=:\\\
\t
inc%[length] \\\
\
cmp byte ptr [%[length]],0 \ n \\ \\ t
jnz .Loop%= \ n
.Lprint%=:\\\
\
sub%[length],%[string]
:[length]=& r(x)
:[string]r(s),[length](s)
:cc内存
);

这不会执行#3的'NULL指针'检查,但它会执行'指针彼得推荐的增量。它还避免了潜在的重复符号(使用%= ),并使用'local'标签(以 .L )以避免额外的符号写入目标文件。



从性能的角度来看,这可能会稍微好一些(我还没有计时) 。然而,从学校项目的角度来看,#3的清晰度似乎是更好的选择。从我在真实世界中写什么,如果出于一些奇怪的原因,我必须用asm编写而不是仅仅使用标准c函数的观点来看,我可能会查看使用情况,除非这是性能关键,我会试着去#3,以便于日后维护。


i try to make an assembly code that count how many characters is in the string, but i get an error.

Code, I use gcc and intel_syntax

#include <stdio.h>

int main(){
char *s = "aqr  b qabxx xryc pqr";
int x;

asm volatile (
    ".intel_syntax noprefix;"
    "mov eax, %1;"
    "xor ebx,ebx;"
    "loop:"
        "mov al,[eax];"
        "or al, al;"
        "jz print;"
        "inc ebx;"
        "jmp loop"
    "print:"
    "mov %0, ebx;"
    ".att_syntax prefix;"
    : "=r" (x)
    : "r" (s)
    : "eax", "ebx"
);

    printf("Length of string: %d\n", x);
    return 0;

}

And i got error:

Error: invalid use of register

Finally I want to make program, which search for regex pattern([pq][^a]+a) and prints it's start position and length. I wrote it in C, but I have to make it work in assembly: My C code:

#include <stdio.h>
#include <string.h>

int main(){
  char *s = "aqr  b qabxx xryc pqr";
  int y,i;
  int x=-1,length=0, pos = 0;

    int len = strlen(s);
    for(i=0; i<len;i++){
        if((s[i] == 'p' || s[i] == 'q') && length<=0){
            pos = i;
            length++;
            continue;
        } else if((s[i] != 'a')) && pos>0){
            length++;
        } else if((s[i] == 'a') && pos>0){
            length++;
            if(y < length) {
                y=length;
                length = 0;
                x = pos;
                pos = 0;    
            }
            else 
                length = 0;
                pos = 0;
        }
    }  

    printf("position: %d, length: %d", x, y);
    return 0;

}

解决方案

You omitted the semicolon after jmp loop and print:.


Also your asm isn't going to work correctly. You move the pointer to s into eax, but then you overwrite it with mov al,[eax]. So the next pass thru the loop, eax doesn't point to the string anymore.

And when you fix that, you need to think about the fact that each pass thru the loop needs to change eax to point to the next character, otherwise mov al,[eax] keeps reading the same character.


Since you haven't accepted an answer yet (by clicking the checkmark to the left), there's still time for one more edit.

Normally I don't "do people's homework", but it's been a few days. Presumably the due date for the assignment has passed. Such being the case, here are a few solutions, both for the education of the OP and for future SO users:

1) Following the (somewhat odd) limitations of the assignment:

asm volatile (
    ".intel_syntax noprefix;"
    "mov eax, %1;"
    "xor ebx,ebx;"
    "cmp byte ptr[eax], 0;"
    "jz print;"
    "loop:"
        "inc ebx;"
        "inc eax;"
        "cmp byte ptr[eax], 0;"
        "jnz loop;"
    "print:"
    "mov %0, ebx;"
    ".att_syntax prefix;"
    : "=r" (x)
    : "r" (s)
    : "eax", "ebx"
);

2) Violating some of the assignment rules to make slightly better code:

asm (
    "\n.intel_syntax noprefix\n\t"
    "mov eax, %1\n\t"
    "xor %0,%0\n\t"
    "cmp byte ptr[eax], 0\n\t"
    "jz print\n"
    "loop:\n\t"
        "inc %0\n\t"
        "inc eax\n\t"
        "cmp byte ptr[eax], 0\n\t"
        "jnz loop\n"
    "print:\n"
    ".att_syntax prefix"
    : "=r" (x)
    : "r" (s)
    : "eax", "cc", "memory"
);

This uses 1 fewer register (no ebx) and omits the (unnecessary) volatile qualifier. It also adds the "cc" clobber to indicate that the code modifies the flags, and uses the "memory" clobber to ensure that any 'pending' writes to s get flushed to memory before executing the asm. It also uses formatting (\n\t) so the output from building with -S is readable.

3) Advanced version which uses even fewer registers (no eax), checks to ensure that s is not NULL (returns -1), uses symbolic names and assumes -masm=intel which results in more readable code:

__asm__ (
    "test %[string], %[string]\n\t"
    "jz print\n"
    "loop:\n\t"
        "inc %[length]\n\t"
        "cmp byte ptr[%[string] + %[length]], 0\n\t"
        "jnz loop\n"
    "print:"
    : [length] "=r" (x)
    : [string] "r" (s), "[length]" (-1)
    : "cc", "memory"
);

Getting rid of the (arbitrary and not well thought out) assignment constraints allows us to reduce this to 7 lines (5 if we don't check for NULL, 3 if we don't count labels [which aren't actually instructions]).

There are ways to improve this even further (using %= on the labels to avoid possible duplicate symbol issues, using local labels (.L), even writing it so it works for both -masm=intel and -masm=att, etc.), but I daresay that any of these 3 are better than the code in the original question.


Well Kuba, I'm not sure what more you are after here before you'll accept an answer. Still, it does give me the chance to include Peter's version.

4) Pointer increment:

__asm__ (
    "cmp byte ptr[%[string]], 0\n\t"
    "jz .Lprint%=\n"
    ".Loop%=:\n\t"
    "inc %[length]\n\t"
    "cmp byte ptr[%[length]], 0\n\t"
    "jnz .Loop%=\n"
    ".Lprint%=:\n\t"    
    "sub %[length], %[string]"
    : [length] "=&r" (x)
    : [string] "r" (s), "[length]" (s)
    : "cc", "memory"
);

This does not do the 'NULL pointer' check from #3, but it does do the 'pointer increment' that Peter was recommending. It also avoids potential duplicate symbols (using %=), and uses 'local' labels (ones that start with .L) to avoid extra symbols getting written to the object file.

From a "performance" point of view, this might be slightly better (I haven't timed it). However from a "school project" point of view, the clarity of #3 seems like it would be a better choice. From a "what would I write in the real world if for some bizarre reason I HAD to write this in asm instead of just using a standard c function" point of view, I'd probably look at usage, and unless this was performance critical, I'd be tempted to go with #3 in order to ease future maintenance.

这篇关于通过字符串汇编循环来计算字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆