在x86汇编语言中,以null结尾的字符串和没有以null终止的字符串之间有什么区别 [英] What is the different between a null terminated string and a string that is not terminated by null in x86 assembly language

查看:93
本文介绍了在x86汇编语言中,以null结尾的字符串和没有以null终止的字符串之间有什么区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在按照Kip Irvine的汇编语言x86编程"书来学习汇编编程.

在书中,作者指出

最常见的字符串类型以空字节(包含0)结尾.称为以空值结尾的字符串

在本书的下一部分中,作者有一个字符串示例,其中没有空字节

  greeting1 \BYTE欢迎使用加密演示程序" 

所以我只是想知道,在x86汇编语言中,以null结尾的字符串和没有以null终止的字符串之间有什么区别?它们可以互换吗?还是彼此不相等?

解决方案

这里没有特定于asm的内容;这是C语言中的相同问题.所有有关如何将字符串存储在内存中并跟踪字符串结束位置的信息.

以null终止的字符串和未以null终止的字符串有什么区别?

以空值结尾的字符串后面有一个 0 字节,因此您可以找到以 strlen 结尾的字符串.(例如,使用慢速 repne scasb ).像C一样,这使得它可用作隐式长度的字符串.

NASM组装-什么是,0;后面的变量解释了使用 db 在静态存储中创建一个变量的NASM语法.在nasm中 db使用情况,尝试存储和打印字符串显示了忘记0终止符时发生的情况.

它们可以互换吗?

如果您知道以空值结尾的字符串的长度,则可以将指针+长度传递给需要显式长度字符串的函数.该函数将永远不会查看 0 字节,因为您将传递不包含 0 字节的长度.这不是正确的字符串数据的一部分.

但是,如果您有一个不带终止符的字符串,则不能将其传递给需要以null终止的字符串的函数或系统调用.(如果内存是可写的,则可以在字符串后存储 0 使其成为以空字符结尾的字符串.)


在Linux中,许多系统调用都将字符串作为C样式的以隐式长度为null终止的字符串.(即,只是一个 char * ,而没有传递长度).

例如, open(2) 接受路径的字符串: int open(const char * pathname,int flags); 您必须将以null终止的字符串传递给系统调用.在Linux上(与大多数其他Unix系统相同),无法创建一个名称包含'\ 0'的文件,因为所有用于处理文件的系统调用都使用以空值结尾的字符串./p>

OTOH, write(2) 需要一个不一定是字符串的内存缓冲区.它具有签名 ssize_t write(int fd,const void * buf,size_t count); .它不在乎是否在 buf + count 处有一个 0 ,因为它只查看从 buf buf + count的字节-1 .

可以将字符串传递给 write().不在乎.基本上,它只是内核的页面缓存(或管道缓冲区或用于非常规文件的任何东西)的存储器.但是就像我说的那样,您不能将任意未终止的缓冲区作为 open()的路径arg传递.

还是彼此不相等?

隐式长度和显式长度是跟踪内存中的字符串数据/常量并将其传递的两种主要方法.他们解决了相同的问题,但是方式相反.

如果您有时需要在遍历它们之前先找到它们的长度,则长隐式长度的字符串是一个不好的选择.遍历字符串比仅读取整数要慢得多.查找隐式长度字符串的长度是 O(n),但是显式长度字符串当然要 O(1)才能找到长度.(已经知道了!).至少以字节为单位的长度是已知的,但是如果采用可变长度编码(如UTF-8或UTF-16),则Unicode字符的长度可能未知.

I'm currently learning assembly programming by following Kip Irvine's "assembly language x86 programming" book.

In the book, the author states

The most common type of string ends with a null byte (containing 0). Called a null-terminated string

In the subsequent section of the book, the author had a string example without the null byte

greeting1 \
BYTE "Welcome to the Encryption Demo program "

So I was just wondering, what is the different between a null terminated string and a string that is not terminated by null in x86 assembly language? Are they interchangeable? Or they are not equivalent of each other?

解决方案

There's nothing specific to asm here; it's the same issue in C. It's all about how you store strings in memory and keep track of where they end.

what is the different between a null terminated string and a string that is not terminated by null?

A null-terminated string has a 0 byte after it, so you can find the end with strlen. (e.g. with a slow repne scasb). This makes is usable as an implicit-length string, like C uses.

NASM Assembly - what is the ", 0" after this variable for? explains the NASM syntax for creating one in static storage with db. db usage in nasm, try to store and print string shows what happens when you forget the 0 terminator.

Are they interchangeable?

If you know the length of a null-terminated string, you can pass pointer+length to a function that wants an explicit-length string. That function will never look at the 0 byte, because you will pass a length that doesn't include the 0 byte. It's not part of the string data proper.

But if you have a string without a terminator, you can't pass it to a function or system-call that wants a null-terminated string. (If the memory is writeable, you could store a 0 after the string to make it into a null-terminated string.)


In Linux, many system calls take strings as C-style implicit-length null-terminated strings. (i.e. just a char* without passing a length).

For example, open(2) takes a string for the path: int open(const char *pathname, int flags); You must pass a null-terminated string to the system call. It's impossible to create a file with a name that includes a '\0' in Linux (same as most other Unix systems), because all the system calls for dealing with files use null-terminated strings.

OTOH, write(2) takes a memory buffer which isn't necessarily a string. It has the signature ssize_t write(int fd, const void *buf, size_t count);. It doesn't care if there's a 0 at buf+count because it only looks at the bytes from buf to buf+count-1.

You can pass a string to write(). It doesn't care. It's basically just a memcpy into the kernel's pagecache (or into a pipe buffer or whatever for non-regular files). But like I said, you can't pass an arbitrary non-terminated buffer as the path arg to open().

Or they are not equivalent of each other?

Implicit-length and explicit-length are the two major ways of keeping track of string data/constants in memory and passing them around. They solve the same problem, but in opposite ways.

Long implicit-length strings are a bad choice if you sometimes need to find their length before walking through them. Looping through a string is a lot slower than just reading an integer. Finding the length of an implicit-length string is O(n), but an explicit-length string is of course O(1) time to find the length. (It's already known!). At least the length in bytes is known, but the length in Unicode characters might not be known, if it's in a variable-length encoding like UTF-8 or UTF-16.

这篇关于在x86汇编语言中,以null结尾的字符串和没有以null终止的字符串之间有什么区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆