为什么malloc的+ memset的比释放calloc慢? [英] Why malloc+memset is slower than calloc?

查看:1303
本文介绍了为什么malloc的+ memset的比释放calloc慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据了解,释放calloc ,它初始化分配的内存的malloc 不同。随着释放calloc ,内存设置为零。随着的malloc ,内存不会被清零。

It's known that calloc is different than malloc in that it initializes the memory allocated. With calloc, the memory is set to zero. With malloc, the memory is not cleared.

因此​​,在日常工作中,我把释放calloc 的malloc + memset的
顺便说一句,为了好玩,我写了下面code为基准。

So in everyday work, I regard calloc as malloc+memset. Incidentally, for fun, I wrote the following code for a benchmark.

结果是混乱的。

code 1:

#include<stdio.h>
#include<stdlib.h>
#define BLOCK_SIZE 1024*1024*256
int main()
{
        int i=0;
        char *buf[10];
        while(i<10)
        {
                buf[i] = (char*)calloc(1,BLOCK_SIZE);
                i++;
        }
}

$ C的输出$ C 1:

Output of Code 1:

time ./a.out  
**real 0m0.287s**  
user 0m0.095s  
sys 0m0.192s  

code 2:

Code 2:

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define BLOCK_SIZE 1024*1024*256
int main()
{
        int i=0;
        char *buf[10];
        while(i<10)
        {
                buf[i] = (char*)malloc(BLOCK_SIZE);
                memset(buf[i],'\0',BLOCK_SIZE);
                i++;
        }
}

$ C的输出$ C 2:

Output of Code 2:

time ./a.out   
**real 0m2.693s**  
user 0m0.973s  
sys 0m1.721s  

更换 memset的 bzero(BUF [I],BLOCK_SIZE)在code 2产生相同的结果。

Replacing memset with bzero(buf[i],BLOCK_SIZE) in Code 2 produces the same result.

我的问题是:为什么的malloc + memset的所以比慢得多释放calloc ?如何释放calloc 做到这一点?

My question is: Why is malloc+memset so much slower than calloc? How can calloc do that?

推荐答案

短的版本:请始终使用释放calloc()而不是的malloc() + memset的()。在大多数情况下,它们将是相同的。在某些情况下,释放calloc()将少做工作,因为它可以跳过 memset的()完全。在其他情况下,释放calloc()甚至可以欺骗和不分配任何内存!然而,的malloc()+ memset的()将永远做全额的工作。

The short version: Always use calloc() instead of malloc()+memset(). In most cases, they will be the same. In some cases, calloc() will do less work because it can skip memset() entirely. In other cases, calloc() can even cheat and not allocate any memory! However, malloc()+memset() will always do the full amount of work.

了解这要求存储系统的短线游。

Understanding this requires a short tour of the memory system.

这里有四个主要部分:你的程序,标准库,内核,和页表。你已经知道你的程序,所以......

There are four main parts here: your program, the standard library, the kernel, and the page tables. You already know your program, so...

内存分配器像的malloc()释放calloc()大多有拿小分配(从1什么字节KB的100S),并将它们组合成更大的内存池。例如,如果你分配16个字节,的malloc()将首先尝试获取16个字节其游泳池之一,然后从内核请求更多的内存时,池枯竭。但是,既然你问的是在一次分配了大量的内存,程序的malloc()释放calloc()将只从内核请求内存直接。此行为的阈值取决于你的系统,但我看到作为阈值1 MIB。

Memory allocators like malloc() and calloc() are mostly there to take small allocations (anything from 1 byte to 100s of KB) and group them into larger pools of memory. For example, if you allocate 16 bytes, malloc() will first try to get 16 bytes out of one of its pools, and then ask for more memory from the kernel when the pool runs dry. However, since the program you're asking about is allocating for a large amount of memory at once, malloc() and calloc() will just ask for that memory directly from the kernel. The threshold for this behavior depends on your system, but I've seen 1 MiB used as the threshold.

内核负责实际的RAM分配给每个进程并确保过程不与其他进程的存储器干涉。这就是所谓的内存保护,的它已经从90年代的污垢常见的,这也是为什么一个程序可以无需关闭整个系统崩溃的原因。因此,当一个程序需要更多的内存,它不能只取记忆,而是它使用系统调用像的mmap(),或者要求从内核内存 SBRK()。内核将通过修改页表给RAM每个进程。

The kernel is responsible for allocating actual RAM to each process and making sure that processes don't interfere with the memory of other processes. This is called memory protection, it has been dirt common since the 1990s, and it's the reason why one program can crash without bringing down the whole system. So when a program needs more memory, it can't just take the memory, but instead it asks for the memory from the kernel using a system call like mmap() or sbrk(). The kernel will give RAM to each process by modifying the page table.

页表映射内存地址与实际的物理内存。你的进程的地址,00000000为0xFFFFFFFF在32位系统上,都不是真正的内存,而是在的虚拟内存。的处理器把这些地址为4 KiB页面,每个页面可以被分配的地址通过修改页表不同的一块物理内存。仅在内核允许修改页表

The page table maps memory addresses to actual physical RAM. Your process's addresses, 0x00000000 to 0xFFFFFFFF on a 32-bit system, aren't real memory but instead are addresses in virtual memory. The processor divides these addresses into 4 KiB pages, and each page can be assigned to a different piece of physical RAM by modifying the page table. Only the kernel is permitted to modify the page table.

下面是如何分配256 MIB做的的工作:

Here's how allocating 256 MiB does not work:


  1. 您进程调用释放calloc()并要求256 MIB。

  1. Your process calls calloc() and asks for 256 MiB.

标准库调用的mmap()并要求256 MIB。

The standard library calls mmap() and asks for 256 MiB.

内核找到未使用的RAM 256 MIB并通过修改页表它给你的过程。

The kernel finds 256 MiB of unused RAM and gives it to your process by modifying the page table.

标准库零用的memset的RAM() calloc可用于返回()

您最终进程退出,内核回收的RAM,因此它可以被另一个进程使用。

Your process eventually exits, and the kernel reclaims the RAM so it can be used by another process.

上述过程会工作,但它只是不会出现这种情况了。有三个主要区别。

How it actually works

The above process would work, but it just doesn't happen this way. There are three major differences.


  • 在你的进程从内核获得新的内存,该内存可能被其他进程previously。这是一个安全隐患。如果内存有密码,加密密钥,或秘密的食谱莎莎?为了保持泄露敏感数据,内核总是把它给一个过程之前洗刷记忆。我们可以通过清空它,以及擦洗内存,如果新的内存归零倒不如让它保障,让的mmap()保证了新的内存返回始终是零。

  • When your process gets new memory from the kernel, that memory was probably used by some other process previously. This is a security risk. What if that memory has passwords, encryption keys, or secret salsa recipes? To keep sensitive data from leaking, the kernel always scrubs memory before giving it to a process. We might as well scrub the memory by zeroing it, and if new memory is zeroed we might as well make it a guarantee, so mmap() guarantees that the new memory it returns is always zeroed.

有很多程序在那里,分配内存,但不使用内存的时候了。有时候内存分配,但从未使用过。内核知道这一点,是懒惰。当你分配新的内存,内核不碰页表在所有的,不给任何RAM到你的过程。相反,它发现了一些地址空间的过程中,使记下什么是应该去那里,并提出了承诺,它将把内存那里,如果你的程序确实曾经使用它。当你的程序试图读取或从这些地址写,处理器触发的页面错误的和分配RAM内核步骤,这些地址并恢复您的程序。如果你从来没有使用内存时,页面错误永远不会发生,你的程序实际上从未得到RAM。

There are a lot of programs out there that allocate memory but don't use the memory right away. Some times memory is allocated but never used. The kernel knows this and is lazy. When you allocate new memory, the kernel doesn't touch the page table at all and doesn't give any RAM to your process. Instead, it finds some address space in your process, makes a note of what is supposed to go there, and makes a promise that it will put RAM there if your program ever actually uses it. When your program tries to read or write from those addresses, the processor triggers a page fault and the kernel steps in assign RAM to those addresses and resumes your program. If you never use the memory, the page fault never happens and your program never actually gets the RAM.

有些工艺分配内存,然后从中读取无需修改它。这意味着,在存储器在不同的处理大量的页面可以填充有MMAP从返回原始零()。由于这些页面都是相同的,内核使所有这些虚拟地址指向一个单独的内存共享的4 KiB页面充满了零。如果您尝试写入内存,处理器会触发另一页故障,并在内核中的步骤,给你不与任何其他程序共享零的新的一页。

Some processes allocate memory and then read from it without modifying it. This means that a lot of pages in memory across different processes may be filled with pristine zeroes returned from mmap(). Since these pages are all the same, the kernel makes all these virtual addresses point a single shared 4 KiB page of memory filled with zeroes. If you try to write to that memory, the processor triggers another page fault and the kernel steps in to give you a fresh page of zeroes that isn't shared with any other programs.

最后的过程大致如下:


  1. 您进程调用释放calloc()并要求256 MIB。

  1. Your process calls calloc() and asks for 256 MiB.

标准库调用的mmap()并要求256 MIB。

The standard library calls mmap() and asks for 256 MiB.

内核找到未使用的地址空间,的做什么地址空间现在用于,并返回一个音符。256 MIB

The kernel finds 256 MiB of unused address space, makes a note about what that address space is now used for, and returns.

标准库知道结果的mmap()总是充满着零(或的,一旦它实际上得到一些RAM),所以它不会触碰内存,所以不存在缺页,并且RAM从未给你的过程。

The standard library knows that the result of mmap() is always filled with zeroes (or will be once it actually gets some RAM), so it doesn't touch the memory, so there is no page fault, and the RAM is never given to your process.

您最终进程退出,内核并不需要回收的RAM,因为它从来没有在第一时间进行分配。

Your process eventually exits, and the kernel doesn't need to reclaim the RAM because it was never allocated in the first place.

如果您使用 memset的()零的页面, memset的()将触发页面错误,原因RAM中得到分配,然后零它,即使它已经充满了零。这是额外的工作,一个巨大的量,并解释了为什么释放calloc()的malloc()更快 memset的()。如果最终使用的内存反正释放calloc()仍快于的malloc() memset的()但不同的是没有这么可笑的。

If you use memset() to zero the page, memset() will trigger the page fault, cause the RAM to get allocated, and then zero it even though it is already filled with zeroes. This is an enormous amount of extra work, and explains why calloc() is faster than malloc() and memset(). If end up using the memory anyway, calloc() is still faster than malloc() and memset() but the difference is not quite so ridiculous.

并非所有系统都分页虚拟内存,所以并不是所有的系统都可以使用这些优化。这适用于很老的处理器,如80286以及嵌入式处理器这是一个复杂的内存管理单元太小。

Not all systems have paged virtual memory, so not all systems can use these optimizations. This applies to very old processors like the 80286 as well as embedded processors which are just too small for a sophisticated memory management unit.

这也将不总是具有较小的分配工作。对于较小的分配,释放calloc()共享池中,而不是直接将内核得到内存。一般情况下,共享池可能会存储在从旧的内存垃圾数据被使用,并释放与自由(),所以释放calloc()可以采取记忆和调用 memset的()来清除它。常见的实现将跟踪哪个共享池的部分是原始的,仍然用零填充,但不是所有的实施方式做到这一点。

This also won't always work with smaller allocations. With smaller allocations, calloc() gets memory from a shared pool instead of going directly to the kernel. In general, the shared pool might have junk data stored in it from old memory that was used and freed with free(), so calloc() could take that memory and call memset() to clear it out. Common implementations will track which parts of the shared pool are pristine and still filled with zeroes, but not all implementations do this.

根据不同的操作系统,内核可能会或可能不会为零存储器中的空闲时间,万一以后需要得到一些清零的内存。 Linux确实提前时间不为零的内存和蜻蜓BSD最近还取消了这一功能从他们的内核。其他一些内核提前做好零内存,但是。归零durign空闲页是不够无论如何解释较大的性能差异。

Depending on the operating system, the kernel may or may not zero memory in its free time, in case you need to get some zeroed memory later. Linux does not zero memory ahead of time, and Dragonfly BSD recently also removed this feature from their kernel. Some other kernels do zero memory ahead of time, however. Zeroing pages durign idle isn't enough to explain the large performance differences anyway.

释放calloc()不使用 memset的一些特殊的内存对齐版本(),并且函数不会让它更快反正。对于现代的处理器大部分 memset的()的实现看上去有点像这样的:

The calloc() function is not using some special memory-aligned version of memset(), and that wouldn't make it much faster anyway. Most memset() implementations for modern processors look kind of like this:

function memset(dest, c, len)
    // one byte at a time, until the dest is aligned...
    while (len > 0 && ((unsigned int)dest & 15))
        *dest++ = c
        len -= 1
    // now write big chunks at a time (processor-specific)...
    // block size might not be 16, it's just pseudocode
    while (len >= 16)
        // some optimized vector code goes here
        // glibc uses SSE2 when available
        dest += 16
        len -= 16
    // the end is not aligned, so one byte at a time
    while (len > 0)
        *dest++ = c
        len -= 1

所以你可以看到, memset的()速度非常快,你不会真的要得到任何的大块内存更好。

So you can see, memset() is very fast and you're not really going to get anything better for large blocks of memory.

事实上, memset的()被清零一个已经归零并不意味着内存被零两次,但只解释了2倍的性能差存储器。这里的性能差异是非常大(我测量幅度超过三个订单我的系统上的malloc之间的()+ memset的()释放calloc() )。

The fact that memset() is zeroing memory that is already zeroed does mean that the memory gets zeroed twice, but that only explains a 2x performance difference. The performance difference here is much larger (I measured more than three orders of magnitude on my system between malloc()+memset() and calloc()).

相反,循环10次,写分配内存,直到的malloc()释放calloc()申报程序NULL。

Instead of looping 10 times, write a program that allocates memory until malloc() or calloc() returns NULL.

如果您添加,会发生什么 memset的()

What happens if you add memset()?

这篇关于为什么malloc的+ memset的比释放calloc慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆