文件如何包含空字节? [英] How can a file contain null bytes?

查看:199
本文介绍了文件如何包含空字节?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在操作系统中以空终止字符串(即C)编写的文件中文件是否可能包含空字节?

How is it possible that files can contain null bytes in operating systems written in a language with null-terminating strings (namely, C)?

例如,如果我运行以下shell代码:

For example, if I run this shell code:

$ printf "Hello\00, World!" > test.txt
$ xxd test.txt
0000000: 4865 6c6c 6f00 2c20 576f 726c 6421       Hello., World!

我在test.txt中看到了一个空字节(至少在OS X中).如果C使用以零结尾的字符串,并且OS X是用C编写的,那么为什么文件不以零字节结尾,导致文件包含Hello而不是Hello\00, World!?文件和字符串之间有根本区别吗?

I see a null byte in test.txt (at least in OS X). If C uses null-terminating strings, and OS X is written in C, then how come the file isn't terminated at the null byte, resulting in the file containing Hello instead of Hello\00, World!? Is there a fundamental difference between files and strings?

推荐答案

以空字符结尾的字符串是C构造,用于确定要用作字符串的字符序列的结尾.字符串操作功能,例如strcmpstrcpystrchr等,都使用此构造来执行其职责.

Null-terminated strings are a C construct used to determine the end of a sequence of characters intended to be used as a string. String manipulation functions such as strcmp, strcpy, strchr, and others use this construct to perform their duties.

但是您仍然可以在程序内以及文件之间读写包含空字节的二进制数据.您只是不能将它们视为字符串.

But you can still read and write binary data that contains null bytes within your program as well as to and from files. You just can't treat them as strings.

这是一个工作原理的例子:

Here's an example of how this works:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    FILE *fp = fopen("out1","w");
    if (fp == NULL) {
        perror("fopen failed");
        exit(1);
    }

    int a1[] = { 0x12345678, 0x33220011, 0x0, 0x445566 };
    char a2[] =  { 0x22, 0x33, 0x0, 0x66 };
    char a3[] = "Hello\x0World";

    // this writes the whole array
    fwrite(a1, sizeof(a1[0]), 4, fp);
    // so does this
    fwrite(a2, sizeof(a2[0]), 4, fp);
    // this does not write the whole array -- only "Hello" is written
    fprintf(fp, "%s\n", a3);
    // but this does
    fwrite(a3, sizeof(a3[0]), 12, fp);
    fclose(fp);
    return 0;
}

out1的内容:

[dbush@db-centos tmp]$ xxd out1
0000000: 7856 3412 1100 2233 0000 0000 6655 4400  xV4..."3....fUD.
0000010: 2233 0066 4865 6c6c 6f0a 4865 6c6c 6f00  "3.fHello.Hello.
0000020: 576f 726c 6400                           World.

对于第一个数组,由于我们使用了fwrite函数,并告诉它写入了int大小的4个元素,因此数组中的所有值都出现在文件中.从输出中可以看到,所有值均已写入,这些值是32位,并且每个值均以little-endian字节顺序写入.我们还可以看到,数组的第二个和第四个元素每个包含一个空字节,而第三个值为0的则有4个空字节,并且全部出现在文件中.

For the first array, because we use the fwrite function and tell it to write 4 elements the size of an int, all the values in the array appear in the file. You can see from the output that all values are written, the values are 32-bit, and each value is written in little-endian byte order. We can also see that the second and fourth elements of the array each contain one null byte, while the third value being 0 has 4 null bytes, and all appear in the file.

我们还在第二个数组上使用了fwrite,它包含类型为char的元素,并且我们再次看到所有数组元素都出现在文件中.特别是,数组中的第三个值是0,它由一个空字节组成,该空字节也出现在文件中.

We also use fwrite on the second array, which contains elements of type char, and we again see that all array elements appear in the file. In particular, the third value in the array is 0, which consists of a single null byte that also appears in the file.

第一个数组首先使用fprintf函数使用%s格式说明符编写,该说明符需要一个字符串.它将在遇到空字节之前将该数组的前5个字节写入文件,此后它将停止读取该数组.然后,按照格式打印换行符(0x0a).

The third array is first written with the fprintf function using a %s format specifier which expects a string. It writes the first 5 bytes of this array to the file before encountering the null byte, after which it stops reading the array. It then prints a newline character (0x0a) as per the format.

它使用fwrite再次将其写入文件的第三个数组.字符串常量"Hello\x0World"包含12个字节:"Hello"为5个字节,显式空字节为1个字节,世界"为5个字节,而隐式结束字符串常量的空字节则为1个字节.由于为fwrite提供了数组的完整大小(12),因此它将写入所有这些字节.确实,在查看文件内容时,我们看到了每个字节.

The third array it written to the file again, this time using fwrite. The string constant "Hello\x0World" contains 12 bytes: 5 for "Hello", one for the explicit null byte, 5 for "World", and one for the null byte that implicitly ends the string constant. Since fwrite is given the full size of the array (12), it writes all of those bytes. Indeed, looking at the file contents, we see each of those bytes.

作为一个补充说明,在每个fwrite调用中,我都为第三个参数硬编码了数组的大小,而不是使用诸如sizeof(a1)/sizeof(a1[0])这样的更动态的表达式来更清楚地指出有多少个在每种情况下都将写入字节.

As a side note, in each of the fwrite calls, I've hardcoded the size of the array for the third parameter instead of using a more dynamic expression such as sizeof(a1)/sizeof(a1[0]) to make it more clear exactly how many bytes are being written in each case.

这篇关于文件如何包含空字节?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆