发生什么内存后,C字符串'\\ 0'? [英] What happens to memory after '\0' in a C string?

查看:98
本文介绍了发生什么内存后,C字符串'\\ 0'?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出奇的简单/愚蠢/基本的问题,但我不知道:假设我想回到我的功能的C字符串,它的长度我不知道在函数开头的用户。我可以仅放置于在开始时的长度的上限,并且根据处理的尺寸可以缩小。

现在的问题是,有没有什么错分配足够的堆空间(上限),然后终止字符串处理过程中以及短的呢?即如果我坚持一个'\\ 0'到所分配的内存的中间,确实(一)免费()仍能正常工作,和(b)做空间后'\\ 0'变得无关紧要?一旦'\\ 0'被添加,不记忆一下就回来,或者是坐在那里占用空间,直到免费()被称为?它是普遍不好的编程风格离开这个吊空在那里,为了节省一些前期的编程时间调用malloc之前计算出必要的空间?

要给这一些背景,让我们说我想删除连续的重复,像这样的:

输入你好OOOOO! - >输出直升机噢噢!

......以及一些code下面展示我是如何pre计算从我的操作产生的大小,有效地进行处理两次获得堆大小合适的。

 的char * RemoveChains(为const char *海峡)
{
    如果(STR == NULL){
        返回NULL;
    }
    如果(strlen的(STR)== 0){
        字符* outstr =(字符*)malloc的(1);
        * outstr ='\\ 0';
        返回outstr;
    }
    为const char *原= str中; //重用
    焦炭preV = *海峡++; // [$ P $光伏] [STR] [STR + 1] ...
    unsigned int类型outlen = 1; //第一个字符自动计数    //通过模仿加工确定必要的长度
    而(* STR){
        如果(*海峡!= preV){//新遇到的char
            ++ outlen;
            preV = *海峡; //重启链
        }
        ++海峡; //一步沿着输入指针
    }    //声明新的字符串是完美的大小
    字符* outstr =(字符*)malloc的(outlen + 1);
    outstr [outlen] ='\\ 0';
    outstr [0] =原始[0];
    outlen = 1;    //构造输出
    preV = *原++;
    而(*原件){
        如果(*原创!= preV){
            outstr [outlen ++] = *原件;
            preV = *原件;
        }
        原来++;
    }
    返回outstr;
}


解决方案

  

如果我坚持一个'\\ 0'到所分配的内存的中间,确实


  
  

(一)免费()仍正常工作,以及



  

(二)它的'\\ 0'后空间变得无关紧要?一旦'\\ 0'被添加,不记忆一下就回来,或者是坐在那里占用空间,直到免费()被调用?


依赖。通常情况下,当你分配大量的堆空间,系统首先分配的虚拟地址空间 - 因为你写一些实际的物理内存分配到支持它的网页(​​以及以后可能被换出到磁盘时,您的OS有虚拟内存支持)。有名的,虚拟地址空间和实际的物理/交换内存的浪费分配之间的区别使得稀疏矩阵是合理的内存使用效率,这种操作系统。

现在,粒度这个虚拟寻址和分页内存中的页面大小 - 这可能是4K,8K,16K ...?大多数操作系统都可以打电话找出来的页面大小的功能。所以,如果你做了很多小的拨款,然后舍入到页面大小是一种浪费,如果你有相对的内存数量有限的地址空间,你真的需要使用则根据虚拟上述方式解决不会规模(例如,4GB内存具有32位寻址)。在另一方面,如果你有一个64位进程的RAM比如说32GB运行,并且正在做比较少这样的字符串分配,你有虚拟地址空间的大量玩和舍入到页面大小荣获 ŧ量多。

但是 - 注意整个则缓冲在一些较早点终止它(在这种情况下,一次写入到存储器将有备份存储器和可在交换结束)对具有大缓冲器写入之间的差在其中永远只写的第一比特,然后终止(在这种情况下,对于使用的空间向上舍入到页面尺寸备份存储器仅分配)。

另外值得指出的是,在许多操作系统上堆内存可能不会返回到操作系统,直到进程终止:相反,的malloc /自由库通知时,它需要成长堆的操作系统(例如,使用 SBRK()在Windows上的UNIX或虚拟())。在这个意义上,免费()内存是免费为您的进程重新使用,但不是免费的其他进程使用。某些操作系统优化这个 - 例如,使用一个独特,独立releasble内存区域非常大的分配


  

这是普遍不好的编程风格离开这个吊空在那里,为了节省一些前期的编程时间调用malloc之前计算出必要的空间?


此外,这取决于你处理多少这样的分配。如果一个伟大的相对于你的虚拟地址空间/ RAM很多有 - 你要明确地让存储库知道不是所有的原始请求的内存使用实际需要的的realloc()或者你甚至可以使用的strdup()根据实际需要更紧密地分配一个新的块(然后免费()原) - 这取决于您的malloc /免费的图书馆,可能制定出更好或更坏的实现,但是很少有应用会显著任何差的影响

有时你的code可以在图书馆,在那里你不能猜出多少字符串实例调用应用程序将管理 - 在这种情况下,最好提供较慢的行为永远不会太糟糕了......所以瘦朝着缩小的内存块,以适应字符串数据(附加操作的一组号码,这样不会影响大O效率),而不是浪费原始字符串缓冲区的一个未知的比例(在病理情况下 - 零个或一个字符用后任意大的分配)。作为一个性能优化,你可能只懒得回内存,如果unusued空间> =使用的空间 - 调调味,或使其主叫配置

您对另一个答案评论:


  

所以它归结为判断是否的realloc将需要更长的时间,或者preprocessing大小确定?


如果性能是你的首要任务,然后是 - 你要分析。如果你没有CPU限制,那么作为一般规则走preprocessing打,做一个合适大小的分配 - 但只是较少的分裂和混乱。对付,如果你必须写一些功能的特殊preprocessing模式 - 这是一个额外的面的错误和code来维持。 (你自己的 asprintf()的snprintf(),但在执行时,通常需要这种权衡决定至少你可以信任的snprintf()来作为记录和不亲自必须维护它)。

Surprisingly simple/stupid/basic question, but I have no idea: Suppose I want to return the user of my function a C-string, whose length I do not know at the beginning of the function. I can place only an upper bound on the length at the outset, and, depending on processing, the size may shrink.

The question is, is there anything wrong with allocating enough heap space (the upper bound) and then terminating the string well short of that during processing? i.e. If I stick a '\0' into the middle of the allocated memory, does (a.) free() still work properly, and (b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called? Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?

To give this some context, let's say I want to remove consecutive duplicates, like this:

input "Hello oOOOo !!" --> output "Helo oOo !"

... and some code below showing how I'm pre-computing the size resulting from my operation, effectively performing processing twice to get the heap size right.

char* RemoveChains(const char* str)
{
    if (str == NULL) {
        return NULL;
    }
    if (strlen(str) == 0) {
        char* outstr = (char*)malloc(1);
        *outstr = '\0';
        return outstr;
    }
    const char* original = str; // for reuse
    char prev = *str++;       // [prev][str][str+1]...
    unsigned int outlen = 1;  // first char auto-counted

    // Determine length necessary by mimicking processing
    while (*str) {
        if (*str != prev) { // new char encountered
            ++outlen;
            prev = *str; // restart chain
        }
        ++str; // step pointer along input
    }

    // Declare new string to be perfect size
    char* outstr = (char*)malloc(outlen + 1);
    outstr[outlen] = '\0';
    outstr[0] = original[0];
    outlen = 1;

    // Construct output
    prev = *original++;
    while (*original) {
        if (*original != prev) {
            outstr[outlen++] = *original;
            prev = *original;
        }
        ++original;
    }
    return outstr;
}

解决方案

If I stick a '\0' into the middle of the allocated memory, does

(a.) free() still work properly, and

Yes.

(b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called?

Depends. Often, when you allocate large amounts of heap space, the system first allocates virtual address space - as you write to the pages some actual physical memory is assigned to back it (and that may later get swapped out to disk when your OS has virtual memory support). Famously, this distinction between wasteful allocation of virtual address space and actual physical/swap memory allows sparse arrays to be reasonably memory efficient on such OSs.

Now, the granularity of this virtual addressing and paging is in memory page sizes - that might be 4k, 8k, 16k...? Most OSs have a function you can call to find out the page size. So, if you're doing a lot of small allocations then rounding up to page sizes is wasteful, and if you have a limited address space relative to the amount of memory you really need to use then depending on virtual addressing in the way described above won't scale (for example, 4GB RAM with 32-bit addressing). On the other hand, if you have a 64-bit process running with say 32GB of RAM, and are doing relatively few such string allocations, you have an enormous amount of virtual address space to play with and the rounding up to page size won't amount to much.

But - note the difference between writing throughout the buffer then terminating it at some earlier point (in which case the once-written-to memory will have backing memory and could end up in swap) versus having a big buffer in which you only ever write to the first bit then terminate (in which case backing memory is only allocated for the used space rounded up to page size).

It's also worth pointing out that on many operating systems heap memory may not be returned to the Operating System until the process terminates: instead, the malloc/free library notifies the OS when it needs to grow the heap (e.g. using sbrk() on UNIX or VirtualAlloc() on Windows). In that sense, free() memory is free for your process to re-use, but not free for other processes to use. Some Operating Systems do optimise this - for example, using a distinct and independently releasble memory region for very large allocations.

Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?

Again, it depends on how many such allocations you're dealing with. If there are a great many relative to your virtual address space / RAM - you want to explicitly let the memory library know not all the originally requested memory is actually needed using realloc(), or you could even use strdup() to allocate a new block more tightly based on actual needs (then free() the original) - depending on your malloc/free library implementation that might work out better or worse, but very few applications would be significantly affected by any difference.

Sometimes your code may be in a library where you can't guess how many string instances the calling application will be managing - in such cases it's better to provide slower behaviour that never gets too bad... so lean towards shrinking the memory blocks to fit the string data (a set number of additional operations so doesn't affect big-O efficiency) rather than having an unknown proportion of the original string buffer wasted (in a pathological case - zero or one character used after arbitrarily large allocations). As a performance optimisation you might only bother returning memory if unusued space is >= the used space - tune to taste, or make it caller-configurable.

You comment on another answer:

So it comes down to judging whether the realloc will take longer, or the preprocessing size determination?

If performance is your top priority, then yes - you'd want to profile. If you're not CPU bound, then as a general rule take the "preprocessing" hit and do a right-sized allocation - there's just less fragmentation and mess. Countering that, if you have to write a special preprocessing mode for some function - that's an extra "surface" for errors and code to maintain. (This trade-off decision is commonly needed when implementing your own asprintf() from snprintf(), but there at least you can trust snprintf() to act as documented and don't personally have to maintain it).

这篇关于发生什么内存后,C字符串'\\ 0'?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆