字符数组应该如何用作字符串? [英] How should character arrays be used as strings?

查看:48
本文介绍了字符数组应该如何用作字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 C 中的字符串只是字符数组.所以我尝试了下面的代码,但它给出了奇怪的结果,例如垃圾输出或程序崩溃:

I understand that strings in C are just character arrays. So I tried the following code, but it gives strange results, such as garbage output or program crashes:

#include <stdio.h>

int main (void)
{
  char str [5] = "hello";
  puts(str);
}

为什么这不起作用?

它使用 gcc -std=c17 -pedantic-errors -Wall -Wextra 干净利落地编译.

It compiles cleanly with gcc -std=c17 -pedantic-errors -Wall -Wextra.

注意:这篇文章旨在作为规范的常见问题解答,用于解决由于在声明字符串时未能为 NUL 终止符分配空间而引起的问题.>

Note: This post is meant to be used as a canonical FAQ for problems stemming from a failure to allocate room for a NUL terminator when declaring a string.

推荐答案

C 字符串是以空终止符结尾的字符数组.

A C string is a character array that ends with a null terminator.

所有字符都有一个符号表值.空终止符是符号值 0(零).它用于标记字符串的结尾.这是必要的,因为字符串的大小没有存储在任何地方.

All characters have a symbol table value. The null terminator is the symbol value 0 (zero). It is used to mark the end of a string. This is necessary since the size of the string isn't stored anywhere.

因此,每次为字符串分配空间时,都必须为空终止符留出足够的空间.您的示例没有这样做,它只为 "hello" 的 5 个字符分配空间.正确的代码应该是:

Therefore, every time you allocate room for a string, you must include sufficient space for the null terminator character. Your example does not do this, it only allocates room for the 5 characters of "hello". Correct code should be:

char str[6] = "hello";

或者等效地,您可以为 5 个字符加上 1 个空终止符编写自文档代码:

Or equivalently, you can write self-documenting code for 5 characters plus 1 null terminator:

char str[5+1] = "hello";

但是您也可以使用它并让编译器进行计数并选择大小:

But you can also use this and let the compiler do the counting and pick the size:

char str[] = "hello"; // Will allocate 6 bytes automatically

在运行时为字符串动态分配内存时,还需要为空终止符分配空间:

When allocating memory for a string dynamically in run-time, you also need to allocate room for the null terminator:

char input[n] = ... ;
...
char* str = malloc(strlen(input) + 1);

如果您不在字符串末尾附加空终止符,则需要字符串的库函数将无法正常工作,您将获得未定义行为";垃圾输出或程序崩溃等错误.

If you don't append a null terminator at the end of a string, then library functions expecting a string won't work properly and you will get "undefined behavior" bugs such as garbage output or program crashes.

在 C 中编写空终止符的最常见方法是使用所谓的八进制转义序列",如下所示:''.这 100% 等同于编写 0,但 用作自文档化代码,以声明零明确表示为空终止符.if(str[i] == '') 等代码将检查特定字符是否为空终止符.

The most common way to write a null terminator character in C is by using a so-called "octal escape sequence", looking like this: ''. This is 100% equivalent to writing 0, but the serves as self-documenting code to state that the zero is explicitly meant to be a null terminator. Code such as if(str[i] == '') will check if the specific character is the null terminator.

请注意,术语空终止符与空指针或 NULL 宏无关!这可能会令人困惑——名称非常相似,但含义却截然不同.这就是为什么空终止符有时被称为带有一个 L 的 NUL,不要与 NULL 或空指针混淆.有关详细信息,请参阅此 SO 问题的答案.

Please note that the term null terminator has nothing to do with null pointers or the NULL macro! This can be confusing - very similar names but very different meanings. This is why the null terminator is sometimes referred to as NUL with one L, not to be confused with NULL or null pointers. See answers to this SO question for further details.

代码中的 "hello" 称为 字符串文字.这将被视为只读字符串."" 语法意味着编译器将自动在字符串字面量的末尾附加一个空终止符.因此,如果您打印出 sizeof("hello"),您将得到 6,而不是 5,因为您得到了包含空终止符的数组的大小.

The "hello" in your code is called a string literal. This is to be regarded as a read-only string. The "" syntax means that the compiler will append a null terminator in the end of the string literal automatically. So if you print out sizeof("hello") you will get 6, not 5, because you get the size of the array including a null terminator.

它用 gcc 干净地编译

It compiles cleanly with gcc

确实,甚至没有警告.这是因为 C 语言中有一个微妙的细节/缺陷,它允许使用字符串文字初始化字符数组,该字符串文字包含与数组中的空间完全一样多的字符,然后默默地丢弃空终止符(C17 6.7.9/15).由于历史原因,该语言故意表现得如此,请参阅 字符串初始化的不一致 gcc 诊断 了解详情.另请注意,C++ 在这里有所不同,不允许使用此技巧/缺陷.

Indeed, not even a warning. This is because of a subtle detail/flaw in the C language that allows character arrays to be initialized with a string literal that contains exactly as many characters as there is room in the array and then silently discard the null terminator (C17 6.7.9/15). The language is purposely behaving like this for historical reasons, see Inconsistent gcc diagnostic for string initialization for details. Also note that C++ is different here and does not allow this trick/flaw to be used.

这篇关于字符数组应该如何用作字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆