为什么空值终止字符串?或者:空值终止与字符+长度存储 [英] Why null-terminated strings? Or: null-terminated vs. characters + length storage

查看:409
本文介绍了为什么空值终止字符串?或者:空值终止与字符+长度存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在写一门语言间preTER在C和我的字符串类型包含长度属性,像这样:

I'm writing a language interpreter in C, and my string type contains a length attribute, like so:

struct String
{
    char* characters;
    size_t length;
};

这一点,因为,我已经花了很多时间在我的跨preTER手动处理这类字符串,因为C不包含内置对它的支持。我认为切换到简单的空结尾的字符串正好符合底层的C,但似乎有很多理由不:

Because of this, I have to spend a lot of time in my interpreter handling this kind of string manually since C doesn't include built-in support for it. I've considered switching to simple null-terminated strings just to comply with the underlying C, but there seem to be a lot of reasons not to:

边界检查是内置的,而不是找一个空的,如果你用长度。

Bounds-checking is built-in if you use "length" instead of looking for a null.

您必须遍历整个字符串找到它的长度。

You have to traverse the entire string to find its length.

您必须做一些额外的东西来处理一个空值终止字符串中间一个空字符。

You have to do extra stuff to handle a null character in the middle of a null-terminated string.

空值终止字符串和统一code处理不好。

Null-terminated strings deal poorly with Unicode.

非空值终止字符串可以实习生多,即对你好,世界和你好的人物可以被存储在同一个地方,只是有不同的长度。这不能用null结尾的字符串完成的。

Non-null-terminated strings can intern more, i.e. the characters for "Hello, world" and "Hello" can be stored in the same place, just with different lengths. This can't be done with null-terminated strings.

串片(注:字符串是不可变的,我的语言)。显然,第二个是速度较慢(更容易出错:考虑增加错误检查的开始结束这两个功能)。

String slice (note: strings are immutable in my language). Obviously the second is slower (and more error-prone: think about adding error-checking of begin and end to both functions).

struct String slice(struct String in, size_t begin, size_t end)
{
    struct String out;
    out.characters = in.characters + begin;
    out.length = end - begin;

    return out;
}

char* slice(char* in, size_t begin, size_t end)
{
    char* out = malloc(end - begin + 1);

    for(int i = 0; i < end - begin; i++)
        out[i] = in[i + begin];

    out[end - begin] = '\0';

    return out;
}

这一切后,我的思想已经不再是我是否应该使用空结尾的字符串:我在想,为什么C使用它们

After all this, my thinking is no longer about whether I should use null-terminated strings: I'm thinking about why C uses them!

所以我的问题是:在那里空终止我就是缺少任何好处

So my question is: are there any benefits to null-termination that I'm missing?

推荐答案

通常的解决办法是做两件事 - 守的长度,保持空终止。它没有太多的额外工作,意味着你随时准备的字符串传递给任何功能。

The usual solution is to do both - keep the length and maintain the null terminator. It's not much extra work and means that you are always ready to pass the string to any function.

空终止字符串常常漏极上的性能,为明显的原因,采取以发现长度的时间依赖于长度。从有利的一面,他们是C语言重新presenting字符串的标准方法,所以你别无选择,但如果你想用最C库,以支持他们。

Null-terminated strings are often a drain on performance, for the obvious reason that the time taken to discover the length depends on the length. On the plus side, they are the standard way of representing strings in C, so you have little choice but to support them if you want to use most C libraries.

这篇关于为什么空值终止字符串?或者:空值终止与字符+长度存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆