最大限度地提高工作线程利用率 [英] Maximizing Worker Thread Utilization

查看:103
本文介绍了最大限度地提高工作线程利用率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了解决问题(以及更好的理解多任务),我编写了一个小线程池实现.该线程池启动了许多工作线程,这些工作线程在线程池的客户端添加任务时将这些任务从队列中弹出.出于这个问题的目的,当任务队列为空时,工作线程全部终止.

进行一些基本的基准测试后,我发现该应用程序花费约60%的时间等待获取队列锁.大概这主要是在工作线程中发生的.

这仅仅是指示我没有给工作线程足够的工作量,还是更多?我可能缺少增加工作线程吞吐量的简单方法吗?

编辑:这是一些粗略的伪代码,应该可以对其进行一些说明.这是在工作线程执行期间(这是应用程序运行时间的绝大部分)执行锁操作的唯一两个位置.

std::list<task_t> task_list;

// Called by the client to add tasks to the thread pool
void insert_task(const task_t& task)
{
    lock_type listlock(task_mutex);

    task_list.push_back(task);
}

// The base routine of each thread in the pool. Some details
// such as lifetime management have been omitted for clarity.
void worker_thread_base()
{
    while (true)
    {
        task_t task;

        {
        lock_type listlock(task_mutex);

        if (task_list.empty())
            continue;

        task = task_list.front();

        task_list.pop_front();
        }

        do_task(task);
    }
}

解决方案

您的设计是在每个线程都位于其中并且旋转"尝试获取锁的位置构建的.除非每个工作线程都在执行工作,否则这种情况将不断发生-在这种情况下,将不获取锁,而将进行工作.

所有线程都座落在一个锁上,您将花费大量的CPU时间等待.鉴于您的设计,这在某种程度上是可以预期的.

您会发现,如果工作线程较少,则阻塞的时间百分比可能会大大减少-在工作项目多于线程的那一刻,您将花费很少的时间等待该锁. /p>

更好的设计是对您的工作队列使用某种形式的无锁队列,因为这可以防止此时等待.此外,拥有一个等待句柄可以阻塞工作线程,直到队列中有工作为止,这样可以防止不必要的旋转.

To solve a problem (and better my understanding of multitasking) I have written a small thread pool implementation. This thread pool spins up a number of worker threads which pop tasks off of a queue as they are added by the client of the thread pool. For the purposes of this question when the task queue is empty the worker threads are all terminated.

After doing some basic benchmarking I have discovered the application spends ~60% of its time waiting to acquire the queue lock. Presumably this is mostly taking place within the worker threads.

Is this merely an indication I'm not giving the worker threads enough to do, or something more? Is there something straightforward I may be missing to increase worker thread throughput?

EDIT: Here is some rough pseudocode that should illustrate things somewhat. These are the only two places where a lock is acquired/released during the execution of the worker threads (which is a vast majority of the running time of the application.)

std::list<task_t> task_list;

// Called by the client to add tasks to the thread pool
void insert_task(const task_t& task)
{
    lock_type listlock(task_mutex);

    task_list.push_back(task);
}

// The base routine of each thread in the pool. Some details
// such as lifetime management have been omitted for clarity.
void worker_thread_base()
{
    while (true)
    {
        task_t task;

        {
        lock_type listlock(task_mutex);

        if (task_list.empty())
            continue;

        task = task_list.front();

        task_list.pop_front();
        }

        do_task(task);
    }
}

解决方案

Your design is built where each thread sits and "spins" trying to acquire the lock. This will happen constantly unless every worker thread is performing work - in which case the lock will sit unacquired and the work will occur.

With all of your threads just sitting, spinning on a lock, you're going to use quite a bit of CPU time waiting. This is somewhat expected, given your design.

You'll find that the percentage of time blocked will likely shrink dramatically if you have fewer worker threads - and at the point where you have more work items than threads, you'll spend very little time waiting on that lock.

A much better design would be to use some form of lockless queue for your work queue, as this could prevent waiting at this point. In addition, having a wait handle that could block the worker threads until there is work in the queue will prevent the unnecessary spinning.

这篇关于最大限度地提高工作线程利用率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆