在内联 GNU 汇编器中获取字符串长度 [英] get string length in inline GNU Assembler

查看:20
本文介绍了在内联 GNU 汇编器中获取字符串长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在重新学习我在非常旧的 MS-DOS 机器上使用的汇编程序!!!

I am re-learning assembler which I used on very old MS-DOS machines!!!

这是我对这个函数应该是什么样子的理解.当尝试将 0xffffffff 放入 ecx 时,它编译但因 SIGSEGV 而崩溃.

This is my understanding of what that function should look like. It compiles but crashes with a SIGSEGV when trying to put 0xffffffff in ecx.

代码在具有 32 位 Debian 9 的 VM 中运行.任何帮助将不胜感激.

The code is run in a VM with 32-bit Debian 9. Any help would be appreciated.

    int getStringLength(const char *pStr){

        int len = 0;
        char *Ptr = pStr;

        __asm__  (
            "movl %1, %%edi
	"
            "xor %%al, %%al
	"
            "movl 0xffffffff, %%ecx
	"
            "repne scasb
	"
            "subl %%ecx,%%eax
	"
            "movl %%eax,%0"
            :"=r" (len)     /*Output*/
            :"r"(len)       /*Input*/
            :"%eax"         /*Clobbered register*/


    );

        return len;
    }

推荐答案

使用 GCC 的内联汇编学习汇编的问题在于,您将一半的时间花在学习 gcc 的内联汇编的工作原理上,而不是实际学习汇编.例如,这里是我如何编写相同的代码:

The problem with using GCC's inline asm to learn assembly is that you spend half your time learning about how gcc's inline assembly works instead of actually learning assembly. For example here's how I might write this same code:

#include <stdio.h>

int getStringLength(const char *pStr){

    int len;

    __asm__  (
        "repne scasb
	"
        "not %%ecx
	"
        "dec %%ecx"
        :"=c" (len), "+D"(pStr)     /*Outputs*/
        :"c"(-1), "a"(0)            /*Inputs*/
        /* tell the compiler we read the memory pointed to by pStr,
           with a dummy input so we don't need a "memory" clobber */
        , "m" (*(const struct {char a; char x[];} *) pStr)

    );

    return len;
}

查看编译器的 asm 输出 在 Godbolt 编译器浏览器上.虚拟内存输入是棘手的部分:请参阅评论中的讨论和 ongcc 邮件列表 以获取执行此操作的最佳方法,这仍然是安全的.

See the compiler's asm output on the Godbolt compiler explorer. The dummy memory-input is the tricky part: see discussion in comments and on the gcc mailing list for the most optimal way to do this which is still safe.

将此与您的示例进行比较

Comparing this with your example

  1. 我不初始化 len,因为 asm 将其声明为输出 (=c).
  2. 没有必要复制pStr,因为它是一个局部变量.根据规范,我们已经被允许更改它(尽管由于它是 const,我们不应该修改它指向的数据).
  3. 没有理由告诉内联 asm 将 Ptr 放在 eax 中,只是让你的 asm 将其移动到 edi.我只是将值放在 edi 中.请注意,由于 edi 中的值正在更改,我们不能将其声明为输入"(根据规范,内联 asm 不得更改输入的值).将其更改为读/写输出可解决此问题.
  4. 没有必要让 asm 为零 eax,因为你可以让约束为你做这件事.作为附带的好处,gcc 会知道"它在 eax 寄存器中有 0,并且(在优化构建中)它可以重用它(想想:检查 2 个字符串的长度).
  5. 我也可以使用约束来初始化 ecx.如前所述,不允许更改输入值.但是由于我将 ecx 定义为输出,所以 gcc 已经知道我正在更改它.
  6. 由于 ecx、eax 和 edi 的内容都已明确指定,因此无需再破坏任何内容.
  1. I don't initialize len, since the asm declares it as an output (=c).
  2. There's no need to copy pStr since it is a local variable. By spec, we're already allowed to change it (although since it is const we shouldn't modified the data it points to).
  3. There's no reason to tell the inline asm to put Ptr in eax, only to have your asm move it to edi. I just put the value in edi in the first place. Note that since the value in edi is changing, we can't just declare it as an 'input' (by spec, inline asm must not change the value of inputs). Changing it to a read/write output solves this problem.
  4. There's no need to have the asm zero eax, since you can have the constraints do it for you. As a side benefit, gcc will 'know' that it has 0 in the eax register, and (in optimized builds) it can re-use it (think: checking the length of 2 strings).
  5. I can use the constraints to initialize ecx too. As mentioned, changing the value of inputs is not allowed. But since I define ecx as an output, gcc already knows that I'm changing it.
  6. Since the contents of ecx, eax and edi are all explicitly specified, there's no need to clobber anything anymore.

所有这些都使代码(稍微)更短且更高效.

All of which makes for (slightly) shorter and more efficient code.

但这太荒谬了.你到底应该怎么知道(我可以对 SO 说见鬼"吗?)你应该知道这一切吗?

But this is ridiculous. How the heck (can I say 'heck' on SO?) are you supposed to know all that?

如果目标是学习 asm,那么使用内联 asm 不是您最好的方法(实际上我会说内联 asm 是一个 坏主意).我建议您将 getStringLength 声明为 extern 并完全用 asm 编写,然后将其与您的 C 代码链接.

If the goal is to learn asm, using inline asm is not your best approach (in fact I'd say that inline asm is a bad idea in most cases). I'd recommend that you declare getStringLength as an extern and write it completely in asm then link it with your C code.

通过这种方式,您将了解参数传递、返回值、保留寄存器(同时了解哪些寄存器必须保留,哪些可以安全地用作暂存器)、堆栈帧、如何将 asm 与 C 链接等,等等. 所有这些都比这个用于内联汇编的 gobbledygook 更有用.

That way you learn about parameter passing, return values, preserving registers (along with learning which registers must be preserved and which you can safely use as scratch), stack frames, how to link asm with C, etc, etc, etc. All of which is more useful to know than this gobbledygook for inline asm.

这篇关于在内联 GNU 汇编器中获取字符串长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆