使用Boost.Lockfree队列比使用互斥体慢 [英] Using Boost.Lockfree queue is slower than using mutexes

查看:67
本文介绍了使用Boost.Lockfree队列比使用互斥体慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

直到现在,我一直在项目中使用std::queue.我测量了此队列上特定操作所需的平均时间.

Until now I was using std::queue in my project. I measured the average time which a specific operation on this queue requires.

时间是在两台计算机上测量的:我的本地Ubuntu VM和一台远程服务器. 使用std::queue,两台计算机上的平均值几乎相同:〜750微秒.

The times were measured on 2 machines: My local Ubuntu VM and a remote server. Using std::queue, the average was almost the same on both machines: ~750 microseconds.

然后我将std::queue升级"到boost::lockfree::spsc_queue,因此我可以摆脱保护队列的互斥锁.在我的本地VM上,我可以看到巨大的性能提升,现在平均为200微秒.但是,在远程计算机上,平均时间达到800微秒,比以前慢.

Then I "upgraded" the std::queue to boost::lockfree::spsc_queue, so I could get rid of the mutexes protecting the queue. On my local VM I could see a huge performance gain, the average is now on 200 microseconds. On the remote machine however, the average went up to 800 microseconds, which is slower than it was before.

首先,我认为这可能是因为远程计算机可能不支持无锁实现:

First I thought this might be because the remote machine might not support the lock-free implementation:

Boost.Lockfree页面上

并非所有硬件都支持相同的原子指令集.如果它在硬件中不可用,则可以使用防护在软件中对其进行仿真.但是,这样做的明显缺点是失去了无锁属性.

Not all hardware supports the same set of atomic instructions. If it is not available in hardware, it can be emulated in software using guards. However this has the obvious drawback of losing the lock-free property.

要了解是否支持这些说明,boost::lockfree::queue有一个称为bool is_lock_free(void) const;的方法. 但是,boost::lockfree::spsc_queue没有这样的功能,对我来说,这意味着它不依赖硬件,并且在任何计算机上始终是无锁的.

To find out if these instructions are supported, boost::lockfree::queue has a method called bool is_lock_free(void) const;. However, boost::lockfree::spsc_queue does not have a function like this, which, for me, implies that it does not rely on the hardware and that is is always lockfree - on any machine.

性能下降的原因可能是什么?

What could be the reason for the performance loss?

// c++11 compiler and boost library required

#include <iostream>
#include <cstdlib>
#include <chrono>
#include <async>
#include <thread>
/* Using blocking queue:
 * #include <mutex>
 * #include <queue>
 */
#include <boost/lockfree/spsc_queue.hpp>


boost::lockfree::spsc_queue<int, boost::lockfree::capacity<1024>> queue;

/* Using blocking queue:
 * std::queue<int> queue;
 * std::mutex mutex;
 */

int main()
{
    auto producer = std::async(std::launch::async, [queue /*,mutex*/]() 
    {
        // Producing data in a random interval
        while(true)
        {
            /* Using the blocking queue, the mutex must be locked here.
             * mutex.lock();
             */

            // Push random int (0-9999)
            queue.push(std::rand() % 10000);

            /* Using the blocking queue, the mutex must be unlocked here.
             * mutex.unlock();
             */

            // Sleep for random duration (0-999 microseconds)
            std::this_thread::sleep_for(std::chrono::microseconds(rand() % 1000));
        }
    }

    auto consumer = std::async(std::launch::async, [queue /*,mutex*/]() 
    {
        // Example operation on the queue.
        // Checks if 1234 was generated by the producer, returns if found.

        while(true)
        {
            /* Using the blocking queue, the mutex must be locked here.
             * mutex.lock();
             */

            int value;
            while(queue.pop(value)
            {
                if(value == 1234)
                    return;
            }

            /* Using the blocking queue, the mutex must be unlocked here.
             * mutex.unlock();
             */

            // Sleep for 100 microseconds
            std::this_thread::sleep_for(std::chrono::microseconds(100));
        }
    }

    consumer.get();
    std::cout << "1234 was generated!" << std::endl;
    return 0;
}

推荐答案

与基于锁的算法相比,无锁算法的性能通常较差.这是关键的原因,它们的使用频率不高.

Lock free algorithms generally perform more poorly than lock-based algorithms. That's a key reason they're not used nearly as frequently.

无锁算法的问题在于,它们通过允许竞争线程继续竞争来最大化争用.锁通过调度竞争线程来避免争用.仅在不可能对竞争线程进行调度的情况下,才应使用无锁算法,这是一个近似的方法.这很少适用于应用程序级代码.

The problem with lock free algorithms is that they maximize contention by allowing contending threads to continue to contend. Locks avoid contention by de-scheduling contending threads. Lock free algorithms, to a first approximation, should only be used when it's not possible to de-schedule contending threads. That only rarely applies to application-level code.

让我给您一个非常极端的假设.想象一下,在典型的现代双核CPU上运行着四个线程.线程A1和A2在操纵集合A.线程B1和B2在操纵集合B.

Let me give you a very extreme hypothetical. Imagine four threads are running on a typical, modern dual-core CPU. Threads A1 and A2 are manipulating collection A. Threads B1 and B2 are manipulating collection B.

首先,让我们想象一下集合使用锁.这意味着如果线程A1和A2(或B1和B2)尝试同时运行,则其中一个将被锁阻塞.因此,很快,将有一个A线程和一个B线程正在运行.这些线程将非常快速地运行,并且不会竞争.每当线程尝试竞争时,冲突的线程都会被调度.是的.

First, let's imagine the collection uses locks. That will mean that if threads A1 and A2 (or B1 and B2) try to run at the same time, one of them will get blocked by the lock. So, very quickly, one A thread and one B thread will be running. These threads will run very quickly and will not contend. Any time threads try to contend, the conflicting thread will get de-scheduled. Yay.

现在,想象一下该集合不使用锁.现在,线程A1和A2可以同时运行.这将引起持续的争用.收集的高速缓存行将在两个内核之间进行乒乓球.核心间总线可能会饱和.性能会很糟糕.

Now, imagine the collection uses no locks. Now, threads A1 and A2 can run at the same time. This will cause constant contention. Cache lines for the collection will ping-pong between the two cores. Inter-core buses may get saturated. Performance will be awful.

再次,这被高度夸大了.但是你明白了.您想避免争用,不要承受太多的争斗.

Again, this is highly exaggerated. But you get the idea. You want to avoid contention, not suffer through as much of it as possible.

但是,现在再次运行此思想实验,其中A1和A2是整个系统上的唯一线程.现在,无锁集合可能更好(尽管您可能会发现在这种情况下只拥有一个线程会更好!).

However, now run this thought experiment again where A1 and A2 are the only threads on the entire system. Now, the lock free collection is probably better (though you may find that it's better just to have one thread in that case!).

几乎每个程序员都经历了一个阶段,他们认为锁是不好的,避免锁会使代码运行得更快.最终,他们意识到争用会使事情变慢并锁定,正确使用它可以最大程度地减少争用.

Almost every programmer goes through a phase where they think that locks are bad and avoiding locks makes code go faster. Eventually, they realize that it's contention that makes things slow and locks, used correctly, minimize contention.

这篇关于使用Boost.Lockfree队列比使用互斥体慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆