什么是对空终止字符串的理由? [英] What's the rationale for null terminated strings?

查看:244
本文介绍了什么是对空终止字符串的理由?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然我爱C和C ++,我不禁挠我的头在空的选择结尾的字符串:

As much as I love C and C++, I can't help but scratch my head at the choice of null terminated strings:


  • 长prefixed(即帕斯卡)字符串存在在C之前

  • 长prefixed字符串使几种算法更快允许恒定时间长度查找。

  • 持续prefixed字符串使其更难以引起缓冲区溢出错误。

  • 即使是32位计算机上,如果您允许的字符串,可用内存的大小,长度prefixed字符串只有三个字节宽于一个空结尾的字符串。在16位机器,这是一个字节。在64位机器,4GB的是一个合理的字符串长度的限制,但即使你想将它扩展到机器字的大小,64位机器通常有充足的内存使得额外的7个字节排序一个空的说法。我才知道,原来C标准是为穷人疯狂的机器写的(在内存方面),但效率参数不这里卖我。

  • pretty多少每隔语言(即Perl中,帕斯卡尔,Python和Java和C#等),使用长度prefixed字符串。这些语言通常是在字符串操作基准击败C,因为它们是用绳子更加高效。

  • C ++这种纠正一点与的std :: basic_string的模板,但普通的​​字符数组期待空终止字符串仍然普遍存在。这也是不完美的,因为它需要堆分配。

  • 空终止字符串必须保留一个字符(即,零),这不能在字符串中存在,而长度prefixed字符串可以包含嵌入的空值。

  • Length prefixed (i.e. Pascal) strings existed before C
  • Length prefixed strings make several algorithms faster by allowing constant time length lookup.
  • Length prefixed strings make it more difficult to cause buffer overrun errors.
  • Even on a 32 bit machine, if you allow the string to be the size of available memory, a length prefixed string is only three bytes wider than a null terminated string. On 16 bit machines this is a single byte. On 64 bit machines, 4GB is a reasonable string length limit, but even if you want to expand it to the size of the machine word, 64 bit machines usually have ample memory making the extra seven bytes sort of a null argument. I know the original C standard was written for insanely poor machines (in terms of memory), but the efficiency argument doesn't sell me here.
  • Pretty much every other language (i.e. Perl, Pascal, Python, Java, C#, etc) use length prefixed strings. These languages usually beat C in string manipulation benchmarks because they are more efficient with strings.
  • C++ rectified this a bit with the std::basic_string template, but plain character arrays expecting null terminated strings are still pervasive. This is also imperfect because it requires heap allocation.
  • Null terminated strings have to reserve a character (namely, null), which cannot exist in the string, while length prefixed strings can contain embedded nulls.

若干这些事情都来比C更近光,所以它才有意义对于C不知道他们。然而,一些人简单的前C井走过来的。为什么会空结尾的字符串已被选定,而不是明显优于长prefixing?

Several of these things have come to light more recently than C, so it would make sense for C to not have known of them. However, several were plain well before C came to be. Why would null terminated strings have been chosen instead of the obviously superior length prefixing?

修改:由于一些要求的事实的(并没有像我已经提供的那些)上面我的效率来看,他们从几件事干:

EDIT: Since some asked for facts (and didn't like the ones I already provided) on my efficiency point above, they stem from a few things:


  • 的毗连使用空终止字符串需要O(N + M)的时间复杂度。长度prefixing往往只需要O(M)。

  • 长使用空终止字符串需要O(n)的时间复杂度。长度prefixing是O(1)。

  • length和concat是目前最常见的字符串操作。有几种情况下,空终止字符串可以更有效,但这些经常发生要少得多。

从下面的答案,这些都是某些情况下,空结束的字符串是更有效的:

From answers below, these are some cases where null terminated strings are more efficient:


  • 当你需要切断字符串的开始,需要将它传递给某个方法。你不能真正做到这一点在固定时间长度为prefixing即使你被允许破坏原有的字符串,因为长度preFIX可能需要遵循对齐规则。

  • 在某些情况下,你只是通过串逐个字符循环您可能能够节省CPU寄存器。请注意,这只能在你没有动态分配的字符串的情况下(因为那样的话你就必须释放它,使用CPU迫使登记保存保存您最初从malloc和朋友得到了指针)。

以上都不是几乎一样length和concat一样普遍。

None of the above are nearly as common as length and concat.

有在答案多了一个断言如下:

There's one more asserted in the answers below:


  • 您需要切断字符串的结尾

但这个是不正确 - 它的时间空终止和长度prefixed串相同的金额。 (空结尾的字符串只是坚持要将新的结束是一个空,长度为prefixers刚刚从preFIX减法。)

but this one is incorrect -- it's the same amount of time for null terminated and length prefixed strings. (Null terminated strings just stick a null where you want the new end to be, length prefixers just subtract from the prefix.)

推荐答案

马的嘴

BCPL,B或C支撑的无
  在强烈的字符数据
  语言;每个治疗弦乐器多
  像整数载体和
  由几个补品一般规则
  约定。在这两种BCPL和B一
  字符串表示的地址
  在初始化的静态区域
  字符串的字符,包装成
  细胞。在BCPL,第一填充字节
  包含的字符的数目
  串; B中,没有计
  和字符串由终止
  特殊字符,其中B拼写
   * E 。这种变化部分制成
  以避免对长度的限制
  引起保持的字符串的
  算上8位或9位插槽,
  部分原因是保持的计
  看来,在我们的经验,少
  方便比使用一个终止符。

None of BCPL, B, or C supports character data strongly in the language; each treats strings much like vectors of integers and supplements general rules by a few conventions. In both BCPL and B a string literal denotes the address of a static area initialized with the characters of the string, packed into cells. In BCPL, the first packed byte contains the number of characters in the string; in B, there is no count and strings are terminated by a special character, which B spelled *e. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.

<子>丹尼斯·里奇中号,的C语言开发

这篇关于什么是对空终止字符串的理由?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆