Linux中的内存分配是非阻塞的吗? [英] Is memory allocation in linux non-blocking?

查看:82
本文介绍了Linux中的内存分配是非阻塞的吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很想知道使用默认的new运算符分配内存是否是非阻塞操作.

I am curious to know if the allocating memory using a default new operator is a non-blocking operation.

例如

struct Node {
    int a,b;
};

...

Node foo = new Node();

如果多个线程试图创建一个新节点,并且其中一个在分配过程中被操作系统挂起,是否会阻止其他线程取得进展?

If multiple threads tried to create a new Node and if one of them was suspended by the OS in the middle of allocation, would it block other threads from making progress?

之所以这样问,是因为我有一个并发数据结构来创建新节点.然后,我修改了算法以回收节点.在24核计算机上,这两种算法的吞吐性能几乎相同.但是,我然后创建了一个在所有系统核心上运行的干扰程序,以创建尽可能多的OS抢占.相对于回收节点的算法,创建新节点的算法的吞吐量性能降低了5倍.

The reason why I ask is because I had a concurrent data structure that created new nodes. I then modified the algorithm to recycle the nodes. The throughput performance of the two algorithms was virtually identical on a 24 core machine. However, I then created an interference program that ran on all the system cores in order to create as much OS pre-emption as possible. The throughput performance of the algorithm that created new nodes decreased by a factor of 5 relative the the algorithm that recycled nodes.

我很好奇为什么会发生这种情况.

I'm curious to know why this would occur.

谢谢.

* Edit:将我指向Linux的C ++内存分配器的代码也将有所帮助.在发布此问题之前,我尝试过查找,但是找不到它.

*Edit : pointing me to the code for the c++ memory allocator for linux would be helpful as well. I tried looking before posting this question, but had trouble finding it.

推荐答案

在我看来,如果您的干扰应用程序正在使用new/delete(malloc/free),则该干扰应用程序会更多地干扰非循环测试.但是我不知道您的干扰测试是如何实施的.

seems to me if your interference app were using new/delete (malloc/free), then the interference app's would interfere with the non recycle test more. But I don't know how your interference test is implemented.

取决于您的回收方式(即,如果您使用pthread互斥锁,请上帝禁止),您的回收代码可能会很慢(gcc原子操作在实现回收时会快40倍).

Malloc已经意识到了线程.请使用gcc上的编译器开关来确保您了解它.较新的算法为每个线程维护小内存块的池,因此,如果您的线程有可用的小项,则没有阻塞或几乎没有阻塞.我已经简化了这一点,这取决于您的系统正在使用什么malloc.另外,如果您去分配数百万个项目进行测试....那么,您将看不到这种效果,因为小项目池的大小受到限制.也许你会的.我不知道.如果您在分配后立即释放了该项目,则您更有可能看到它.已释放的小项目将返回小项目列表,而不是共享堆.尽管当线程B释放线程A分配的项目时会发生什么"是一个问题,该问题可能会或可能不会在您的malloc版本上处理,并且可能不会以非阻塞方式处理.可以肯定的是,如果您在大型测试期间没有立即释放,则该线程将不得不多次填充其小型项目列表.如果尝试了多个线程,则可能会阻塞.最后,在某个时候,您的进程的堆会向系统询问堆内存,这显然会阻塞.

Malloc, in some variation for a long time on at least some platforms, has been aware of threads. Use the compiler switches on gcc to be sure you get it. Newer algorithms maintain pools of small memory chunks for each thread, so there is no or little blocking if your thread has the small item available. I have over simplified this and it depends on what malloc your system is using. Plus, if you go and allocate millions of items to do a test....well then you wont see that effect, because the small item pools are limited in size. Or maybe you will. I don't know. If you freed the item right after allocating, you would be more likely to see it. Freed small items go back into the small item lists rather than the shared heap. Although "what happens when thread B frees an item allocated by thread A" is a problem that may or may not be dealt with on your version of malloc and may not be dealt with in a non blocking manner. For sure, if you didn't immediately free during a large test, then the the thread would have to refill its small item list many times. That can block if more than one thread tries. Finally, at some point your process' heap will ask the system for heap memory, which can obviously block.

那么,您是否正在使用较小的内存项目?对于您的malloc,我不知道会是多小,但是如果您< 1k肯定很小.您是一个接一个地分配和释放,还是先分配数千个节点,然后释放数千个节点?您的干扰应用正在分配吗?所有这些都会影响结果.

So are you using small memory items? For your malloc I don't know what small would be, but if you are < 1k that is for sure small. Are you allocating and freeing one after the other, or allocating thousands of nodes and then freeing thousands of nodes? Was your interference app allocating? All of these things will affect the results.

如何通过原子操作(CAS =比较和交换)进行回收:

首先将pNextFreeNode添加到您的节点对象.我使用了void *,可以使用您的类型.该代码用于32位指针,但也适用于64位.然后建立全局回收堆.

First add a pNextFreeNode to your node object. I used void*, you can use your type. This code is for 32 bit pointers, but works for 64 bit as well. Then make a global recycle pile.

void *_pRecycleHead; // global head of recycle list. 

添加到回收堆中

void *Old;
while (1) { // concurrency loop
  Old = _pRecycleHead;  // copy the state of the world. We operate on the copy
  pFreedNode->pNextFreeNode = Old; // chain the new node to the current head of recycled items
  if (CAS(&_pRecycleHead, Old, pFreedNode))  // switch head of recycled items to new node
    break; // success
}

从桩上移走:

void *Old;
while (Old = _pRecycleHead) { // concurrency loop, only look for recycled items if the head aint null
  if (CAS(&_pRecycleHead, Old, Old->pNextFreeNode))  // switch head to head->next.
    break; // success
}
pNodeYoucanUseNow = Old;

使用CAS意味着仅当您更改的项目是您传入的旧值时,操作才会成功.如果发生了竞争,并且首先到达另一个线程,则旧值将有所不同.在现实生活中,这种比赛很少发生. CAS仅比实际设置的速度慢一些,因此与互斥锁相比,它会晃动.

Using CAS means the operation will succeed only if the item you are changing is the Old value you pass in. If there is a race and another thread got there first, then the old value will be different. In real life this race happens very very rarely. CAS is only slighlty slower than actually setting a value so compared to mutexes....it rocks.

如果您快速添加和删除同一项目,则上述从桩中删除项具有竞争条件.我们通过在CAS'able数据中添加版本号来解决该问题.如果您在与回收桩头的指针同时执行版本号,您将获胜.使用工会.无需花费CAS 64位的额外费用.

The remove from pile, above, has a race condition if you add and remove the same item rapidly. We solve that by adding a version # to the CAS'able data. If you do the version # at the same time as the pointer to the head of the recycle pile you win. Use a union. Costs nothing extra to CAS 64 bits.

union TRecycle {
  struct {
    int iVersion;
    void *pRecycleHead;
  } ;  // we can set these.  Note, i didn't name this struct.  You may have to if you want ANSI
  unsigned long long n64;  // we cas this
}

注意,对于64位操作系统,您将必须使用128位结构.所以全球回收堆现在看起来像这样:

Note, You will have to go to 128 bit struct for 64 bit OS. so the global recycle pile looks like this now:

TRecycle _RecycleHead;

添加到回收堆中

while (1) { // concurrency loop
  TRecycle New,Old;
  Old.n64 = _RecycleHead.n64;  // copy state
  New.n64 = Old.n64;  // new state starts as a copy
  pFreedNode->pNextFreeNode = Old.pRecycleHead;  // link item to be recycled into recycle pile
  New.pRecycleHead = pFreedNode;  // make the new state
  New.iVersion++;  // adding item to list increments the version.
  if (CAS(&_RecycleHead.n64, Old.n64, New.n64))  // now if version changed...we fail
    break; // success
}

从桩上移走:

while (1) { // concurrency loop
  TRecycle New,Old;
  Old.n64 = _RecycleHead.n64;  // copy state
  New.n64 = Old.n64;  // new state starts as a copy
  New.pRecycleHead = New.pRecycledHead.pNextFreeNode;  // new will skip over first item in recycle list so we can have that item.
  New.iVersion++;  // taking an item off the list increments the version.
  if (CAS(&_RecycleHead.n64, Old.n64, New.n64))  // we fail if version is different.
    break; // success
}
pNodeYouCanUseNow = Old.pRecycledHead;

我敢打赌,如果您以这种方式回收利用,您会发现性能提高.

I bet if you recycle this way you will see a perf increase.

这篇关于Linux中的内存分配是非阻塞的吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆