线程安全内存池 [英] Thread safe memory pool

查看：715 发布时间：2016/10/22 19:28:03 c++ memory-management concurrency

本文介绍了线程安全内存池的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的应用程序目前性能至关重要，每帧请求3-5百万个对象。最初，为了让球滚动，我 new'd 一切，让应用程序工作和测试我的算法。应用程序是多线程的。

一旦我对性能感到满意，我就开始为我的对象创建一个内存管理器。显而易见的原因是内存碎片和浪费。由于内存碎片，应用程序在崩溃之前无法继续超过几个帧。我检查了内存泄漏，知道应用程序是无泄漏的。

所以我开始使用TBB的 concurrent_queue 创建一个简单的内存管理器。队列存储应用程序允许使用的最大元素集。需要新元素的类从队列中弹出元素。根据英特尔的文档， try_pop 方法是无锁的。这在内存消耗（尽管仍然存在内存碎片，但不像以前那么多）工作相当好。我现在面临的问题是，应用程序的性能已经减慢了大约4倍，根据我自己的简单profiler（我没有访问商业剖析器或知道任何将工作的实时应用程序...任何建议将被赞赏）。

我的问题是，是否有一个线程安全的内存池是可扩展的。池的必须具有功能是快速回收元素并使其可用。如果没有，任何提示/技巧性能明智？

编辑：我想我会解释这个问题。我可以很容易地初始化 n 个数组，其中 n 是线程数，并开始从每个线程的数组中使用对象。这将在一些情况下完美工作。在我的情况下，我也回收元素（可能每帧），它们可以在阵列中的任何点回收;即它可以来自 elementArray [0] 或 elementArray [10] 或 elementArray [1000 ] 部分数组。现在我将有一个零散的元素数组，包括准备使用的元素和正在使用的元素：

解决方案

正如在注释中所说，不要得到一个线程安全的内存分配器，每线程分配内存。

正如你暗示在你的更新，你需要管理这是一个非常简单的问题，给定一个常量类型和没有并发。

例如（我的头顶，未经测试）：

 模板< typename T> 
类ThreadStorage 
 {
 std :: vector< T& ; m_objs; 
 std :: vector< size_t> m_avail; 
 
 public：
 explicit ThreadStorage（size_t count）：m_objs（count，T（））{
 m_avail.reserve（count）; 
 for（size_t i = 0; i } 
 
 T * alloc （）{
 T * retval =& m_objs [0] + m_avail.back（）; 
 m_avail.pop_back（）; 
 return retval; 
} 
 
 void free（T * p）{
 * p = T（）; //假设这是足够的破坏。 
 m_avail.push_back（p  - & m_objs [0]）; 
} 
};然后，对于每个线程，有一个ThreadStorage实例，并调用alloc（）和free（）作为一个线程。 
 
 
 您可以添加智能指针来管理调用free（），如果这很昂贵，您可以优化构造函数/析构函数调用。
 
 
 您还可以查看boost :: pool。
 
 
 更新：
 
 
 跟踪已经使用的东西，以便他们可以在第二次通过处理的新要求似乎有点不清楚我。我想你的意思是当一个对象的主要处理完成，你需要不释放它，但保留一个引用它进行第二阶段处理。一些对象你只是被释放回池，不用于第二阶段处理。
 
 
 我假设你想在同一个线程中做这个。
 
 
 作为第一遍，你可以添加一个这样的方法到ThreadStorage，并且当你想对所有未发布的T实例进行处理时调用它。不需要额外的书。 p> 
 
 
  void do_processing（boost :: function< void（T * p）> const& f）{
 std :: sort（m_avail.begin（），m_avail.end（））; 
 
 size_t o = 0; 
 for（size_t i = 0; i！= m_avail.size（）; ++ i）{
 if（o  do {
f （& m_objs [o]）; 
} while（++ o  ++ o; 
} else of（o == m_avail [i]）
 ++ o; 
} 
 
 for（; o< m_objs.size（）; ++ o）f（& m_objs [o]）; 
} 
  
假设没有其他线程正在使用ThreadStorage实例，这是合理的，因为它是线程本地设计。再次，我的头顶，未经测试。
 
My application currently is highly performance critical and is requests 3-5 million objects per frame. Initially, to get the ball rolling, I new'd everything and got the application to work and test my algorithms. The application is multi-threaded. 

Once I was happy with the performance, I started to create a memory manager for my objects. The obvious reason is memory fragmentation and wastage. The application could not continue for more than a few frames before crashing due to memory fragmentation. I have checked for memory leaks and know the application is leak free. 

So I started creating a simple memory manager using TBB's concurrent_queue. The queue stores a maximum set of elements the application is allowed to use. The class requiring new elements pops elements from the queue. The try_pop method is, according to Intel's documentation, lock-free. This worked quite well as far as memory consumption goes (although there is still memory fragmentation, but not nearly as much as before). The problem I am facing now is that the application's performance has slowed down approximately 4 times according to my own simple profiler (I do not have access to commercial profilers or know of any that will work on a real-time application... any recommendation would be appreciated). 

My question is, is there a thread-safe memory pool that is scalable. A must-have feature of the pool is fast recycling of elements and making them available. If there is none, any tips/tricks performance wise?

EDIT: I thought I would explain the problem a bit more. I could easily initialize n number of arrays where n is the number of threads and start using the objects from the arrays per thread. This will work perfectly for some cases. In my case, I am recycling the elements as well (potentially every frame) and they could be recycled at any point in the array; i.e. it may be from elementArray[0] or elementArray[10] or elementArray[1000] part of the array. Now I will have a fragmented array of elements consisting of elements that are ready to be used and elements that are in-use :(
 解决方案 
As said in comments, don't get a thread-safe memory allocator, allocate memory per-thread.

As you implied in your update, you need to manage free/in-use effectively. That is a pretty straightforward problem, given a constant type and no concurrency.

For example (off the top of my head, untested):
template<typename T>
class ThreadStorage
{
    std::vector<T> m_objs;
    std::vector<size_t> m_avail;

public:
    explicit ThreadStorage(size_t count) : m_objs(count, T()) {
        m_avail.reserve(count);
        for (size_t i = 0; i < count; ++i) m_avail.push_back(i);
    }

    T* alloc() {
        T* retval = &m_objs[0] + m_avail.back();
        m_avail.pop_back();
        return retval;
    }

    void free(T* p) {
        *p = T(); // Assuming this is enough destruction.
        m_avail.push_back(p - &m_objs[0]);
    }
};
Then, for each thread, have a ThreadStorage instance, and call alloc() and free() as required.

You can add smart pointers to manage calling free() for you, and you can optimise constructor/destructor calling if that's expensive.

You can also look at boost::pool.

Update:

The new requirement for keeping track of things that have been used so that they can be processed in a second pass seems a bit unclear to me.  I think you mean that when the primary processing is finished on an object, you need to not release it, but keep a reference to it for second stage processing. Some objects you will just be released back to the pool and not used for second stage processing.

I assume you want to do this in the same thread.

As a first pass, you could add a method like this to ThreadStorage, and call it when you want to do processing on all unreleased instances of T. No extra book keeping required.
void do_processing(boost::function<void (T* p)> const& f) {
    std::sort(m_avail.begin(), m_avail.end());

    size_t o = 0;
    for (size_t i = 0; i != m_avail.size(); ++i) {
        if (o < m_avail[i]) {
            do {
                f(&m_objs[o]);
            } while (++o < m_avail[i]);
            ++o;
        } else of (o == m_avail[i])
            ++o;
    }

    for (; o < m_objs.size(); ++o) f(&m_objs[o]);
}
Assumes no other thread is using the ThreadStorage instance, which is reasonable because it is thread-local by design. Again, off the top of my head, untested.

                        这篇关于线程安全内存池的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

线程安全内存池 [英] Thread safe memory pool

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

线程安全内存池 [英] Thread safe memory pool

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭