定义一个字符串,结尾没有空终止符char(\ 0) [英] Defining a string with no null terminating char(\0) at the end

查看:143
本文介绍了定义一个字符串,结尾没有空终止符char(\ 0)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C/C ++中,有什么不同的方法来定义一个不以null结尾的字符结尾的字符串(\ 0)?

编辑:我只对字符数组感兴趣,而对STL字符串不感兴趣.

解决方案

通常是另一位张贴者写道:

  char s [6] = {'s','t','r','i','n','g'}; 

或者如果您当前的C字符集是ASCII,通常是这样(今天的EBCDIC很少)

  char s [6] = {115,116,114,105,110,107}; 

还有一种很大程度上被忽略的方式仅适用于C(不适用于C ++)

  char s [6] ="string"; 

如果数组的大小太小而不能容纳最后的0(但足以容纳常量字符串的所有其他字符),则不会复制最后的零,但它仍然是有效的C(但无效的C ++)

显然,您也可以在运行时执行此操作:

  char s [6];s [0] ='s';s [1] ='t';s [2] ='r';s [3] ='i';s [4] ='n';s [5] ='g'; 

或(与ASCII字符集相同的注释如上所述)

  char s [6];s [0] = 115;s [1] = 116;s [2] = 114;s [3] = 105;s [4] = 110;s [5] = 103; 

或使用内存复制(或内存复制或bcopy,但在这种情况下这样做没有任何好处).

  memcpy(c,"string",6); 

或strncpy

  strncpy(c,"string",6); 

应该理解的是,C中没有字符串这样的东西(在C ++中有字符串对象,但这完全是另一回事了).所谓的字符串只是char数组.甚至名称char都具有误导性,它不是char,而只是一种数字类型.我们本来可以称它为字节,但是在过去,使用9位寄存器或类似的东西时会出现奇怪的硬件,而字节则意味着8位.

由于char通常用于存储字符代码,因此C设计人员想到了一种比在char中存储数字更简单的方法.您可以在简单的引号之间加上一个字母,并且编译器会理解它必须将此字符代码存储在char中.

例如,我的意思是您不必做

  char c ='\ 0'; 

要将代码0存储在char中,只需执行以下操作:

  char c = 0; 

由于我们经常需要处理一堆可变长度的字符,因此C设计人员还选择了字符串"的约定.只需将代码0放在文本应结束的位置即可.顺便说一句,这种字符串表示形式的名称为零终止字符串",如果在变量名称的开头看到两个字母sz,则通常表示它的内容是零终止字符串.

"C sz字符串"根本不是一种类型,只是一个与int数组一样正常的char数组,而是字符串操作函数(strcmp,strcpy,strcat,printf和许多其他函数)可以理解并使用0结尾约定.这也意味着,如果您有一个不为零终止的char数组,则不应调用任何这些函数,因为它可能会做错事情(或者您必须格外小心,并使用带有 n 字母,例如 strncpy ).

此约定的最大问题是在许多情况下效率低下.一个典型的例子:您想在0终止的字符串的末尾放置一些内容.如果保持大小,则可以使用sz约定跳到字符串的末尾,必须逐字符检查它.在处理编码的unicode或此类代码时,还会发生其他类型的问题.但是在创建C时,这个约定非常简单并且可以很好地完成工作.

如今,双引号之间的字母(例如"string")不是像过去一样的纯字符数组,而是 const char * .这意味着指针所指向的是一个不应修改的常量(如果要修改它,则必须先将其复制),这是一件好事,因为它有助于在编译时检测许多编程错误.>

What are various ways in C/C++ to define a string with no null terminating char(\0) at the end?

EDIT: I am interested in character arrays only and not in STL string.

解决方案

Typically as another poster wrote:

char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};

or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)

char s[6] = {115, 116, 114, 105, 110, 107};

There is also a largely ignored way that works only in C (not C++)

char s[6] = "string";

If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).

Obviously you can also do it at run time:

char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';

or (same remark on ASCII charset as above)

char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;

Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).

memcpy(c, "string", 6);

or strncpy

strncpy(c, "string", 6);

What should be understood is that there is no such thing as a string in C (in C++ there is strings objects, but that's completely another story). So called strings are just char arrays. And even the name char is misleading, it is no char but just a kind of numerical type. We could probably have called it byte instead, but in the old times there was strange hardware around using 9 bits registers or such and byte implies 8 bits.

As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler would understand it must store this character code in the char.

What I mean is (for example) that you don't have to do

char c = '\0';

To store a code 0 in a char, just do:

char c = 0;

As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind of string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.

"C sz strings" is not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also means that if you have a char array that is not zero terminated, you shouldn't call any of these functions as it will likely do something wrong (or you must be extra carefull and use functions with a n letter in their name like strncpy).

The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kind of problems occur when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.

Nowadays, the letters between double quotes like "string" are not plain char arrays as in the past, but const char *. That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.

这篇关于定义一个字符串,结尾没有空终止符char(\ 0)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆