此多线程用例的最佳数据结构:侵入式列表好吗? [英] Best Data Structure for this Multithreaded Use Case: Is Intrusive List good?

查看:32
本文介绍了此多线程用例的最佳数据结构:侵入式列表好吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要设计一个支持以下操作的数据结构:

I need to design a data structure that supports the following operations:

    数据结构中的
  • 搜索元素基于作为间隔的关键字.例如,间隔1-5之内的值可以是3,6-11的值可以是5,依此类推.间隔是连续的,并且它们之间没有重叠.
  • 查找上一个和下一个间隔-这几乎与搜索间隔一样频繁.
  • 分割时间间隔,加入连续的时间间隔
  • 并发性:我已将设计限制为一个写入器线程和其他读取器线程,如下所示.编写器线程可以拆分或加入间隔,也可以修改间隔中的值.任何读取器线程只能在一个时间间隔内读取值(读取器可以读取多个时间间隔,但是我不必序列化所有读取-在两个读取之间进行写入是可以的).每个读取器每次写入大约有20-80次读取.此外,我仍然必须确定读者数量,但大约是2-8.

我考虑使用列表在中间添加和删除元素.间隔的数量有限-因此使用map可能不合适.STL列表未很好地支持这种访问方式(一个作者,多个读者).boost :: intrusive :: list似乎合适.在侵入式列表的顶部,我将必须获得锁才能读取/写入间隔.

I consider using list for adding and deleting elements in the middle. There would only be a limited number of intervals - so probably using map won't be right. This kind of access (one writer, multiple readers) is not well-supported by STL list. boost::intrusive::list seems appropriate. On top of the intrusive list, I will have to acquire locks to read/write the intervals.

此外,我知道,与STL列表相比,侵入式列表可以用于更好的缓存位置(以及对所包含对象的适当内存分配).

Also, I understand intrusive list may be used for better cache locality (along with appropriate memory allocation for the contained objects) than STL list.

该方法还好吗?如果是,我也很想了解您使用intrusive :: list的经验,尤其是对于多线程应用程序.

Is the approach alright? If yes, I would also be interested to know about you experience with using intrusive::list, particularly for a multithreaded application.

推荐答案

您在这里遇到2个不同的问题:

You have 2 different issues here:

  1. 如何表示您的数据结构
  2. 如何在高效的庄园中确保线程安全

您的数据结构每次写入将进行(20-80)x(2-8)次读取.

Your data structure will be doing (20-80) x (2-8) reads for every write.

(1).首先,假设您的范围如下所示为数据结构

(1). First, lets assume your range is a data structure as follows


    struct Interval
    {
        Interval(int start, int length)
          : m_start(start),
            m_length(length)
        {}
        int m_start;
        int m_length;
        int value; // Or whatever
    };

由于读取的数量远远超过写入的数量,因此查找需要快速,而修改则不能.

Since reads massively outnumber writes, lookup needs to be fast, while modifications don't.

为数据结构使用列表意味着进行O(N)查找和O(1)修改-完全是错误的方法.

Using a list for your data structure means O(N) lookups and O(1) modification - exactly the wrong way around.

您的结构最简单的表示形式是向量.如果按排序顺序保留间隔,则查找为O(logN),修改为O(N).

The simplest possible representation of your structure is a vector. If the intervals are held in sorted order, lookups are O(logN) and modifications are O(N).

要实现此目的,只需向Interval添加一个比较器:

To implement this, just add a comparator to Interval:

bool operator<(const Interval& rhs) const
{
    return m_start < rhs.m_start;
}

然后您可以使用 std :: lower_bound 在O(logN)中查找等于或小于搜索间隔的第一个间隔.

You can then use std::lower_bound to find the first interval equal or lower to your search interval in O(logN).

下一个和上一个间隔为O(1)-递减或递增返回的迭代器.

Next and previous interval are O(1) - decrement or increment the returned iterator.

分割间隔意味着在当前元素之后插入一个新元素,并调整当前元素的长度-O(N).

Splitting an interval means inserting a new element after the current one and adjusting the length of the current - O(N).

加入两个间隔意味着将下一个间隔的长度加到当前间隔的长度上,并擦除下一个间隔-O(N).

Joining two intervals means adding the length of the next one to the current one and erasing the next one - O(N).

您应该在向量中 reserve()留出足够的空间,以获取最大数量的元素,以最小化调整大小的开销.

You should reserve() enough space in the vector for the maximum number of elements to minimise resizing overhead.

(2).继Knuth之后,"过早的优化是万恶之源".

(2). Following Knuth, 'premature optimisation is the root of all evil'.

在保存向量< Interval> 的结构上进行一次单独的读/写锁定就足够了.唯一可能的问题是(2a)由于读者垄断了写入者而导致的写入器饥饿,或者(2b)由于写入者的更新花费了太长时间而导致的读取器饥饿.

A single read/write lock on the structure holding your vector<Interval> will more than likely be sufficient. The only possible problems are (2a) writer starvation because readers monopolise the lock, or (2b) reader starvation because the writers updates take too long.

(2a)如果(且仅当)您面临作家饥饿时,可以使锁定更加精细.极端可能并非如此.为此:

(2a) If (and only if) you face writer starvation, you could make the locking more granular. Its extremely likely that this won't be the case though. To do this:

让矢量通过指针而不是值保持其间隔.这样一来,调整大小就不会在内存中移动对象.让每个间隔都包含一个读/写锁.

Make your vector hold its intervals by pointer, not value. This is so the resizes do not move your objects around in memory. Have each interval contain a read/write lock.

阅读:读取集合的读取锁,然后获取所需的时间间隔.如果不需要读取其他任何间隔,则在获取间隔锁后立即放弃收集锁,以允许其他线程继续运行.

For reads: Take the read lock of the collection, then of the interval you want. If you don't need to read any other intervals, give up the collection lock as soon as you have acquired the interval lock, to allow other threads to proceed.

如果您需要读取其他存储桶,则可以以任何顺序对其进行读取锁定,直到放弃收集读取锁定为止,此时编写器可以添加或删除您尚未锁定的任何时间间隔.获取这些锁的顺序无关紧要,因为在您对集合持有a读取锁的同时,编写者无法更改向量,并且读取锁也无法竞争.

If you need to read other buckets, you can read-lock them in any order until you give up the collection read lock, at which point the writer may add or remove any intervals you havent locked. Order doesn't matter while acquiring these locks since the writer cannot be changing the vector while you hold the a read lock on the collection, and read locks do not contend.

写操作:

获取集合的写入锁,然后获取所需的时间间隔.请注意,对于所有将添加或删除间隔的更新,您都必须对其集合写锁进行锁定.如果仅更新一个时间间隔,则可以放弃收集锁定.否则,您需要保持写锁并在要修改的任何时间间隔上获取写锁.您可以以任何顺序获取间隔锁,因为没有收集锁的读者就无法获取新的读取锁.

Take the write lock of the collection, then of the interval you want. Note that you must hold the collection write lock for the entirety of of any updates that will add or remove intervals. You can give up the collection lock if you are only updating one interval. Otherwise, you need to hold the write lock and acquire a write lock on any intervals you will be modifying. You can acquire the interval locks in any order, since no readers can acquire new read locks without the collection lock.

以上方法对编写者线程更自私,可以消除饥饿.

The above works by being more selfish to the writer thread, which should eliminate starvation.

(2b)如果遇到读者饥饿的情况,这种情况发生的可能性更小,最好的解决方案是将集合的读写操作分开.通过共享指针保留该集合,并对其具有一个写锁定.

(2b) If you face reader starvation, which is even more unlikely, the best solution is to split the collection writes and reads apart. Hold the collection by shared pointer, and have a single write lock on it.

阅读:取得写锁和shared_ptr的副本.放弃写锁定.读取器现在可以无任何锁定(不可变)地读取集合.

For reads: Take the write lock and a copy of the shared_ptr. Give up the write lock. The reader can now read the collection without any locks (its immutable).

写:根据读者将shared_ptr带到集合中,放弃锁定.制作集合的私有副本并对其进行修改(由于它是私有副本,因此无需锁定).再次获取写锁,并用新集合替换现有的shared_ptr.完成旧集合的最后一个线程将销毁它.将来所有的线程都将使用新更新的集合.

For writes: Take a shared_ptr to the collection as per the reader, giving up the lock. Make a private copy of the collection and modify it (no locks required since its a private copy). Take the write lock again and replace the existing shared_ptr with your new collection. The last thread to finish with the old collection will destroy it. All future threads will use the newly updated collection.

请注意,根据您的问题描述,此算法仅适用于一位作者.

Note that this algorithm only works with one writer, as per your problem description.

这篇关于此多线程用例的最佳数据结构:侵入式列表好吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆