每个客户端一个线程与线程服务器的排队线程模型之间的相对优点? [英] Relative merits between one thread per client and queuing thread models for a threaded server?

查看:22
本文介绍了每个客户端一个线程与线程服务器的排队线程模型之间的相对优点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们正在构建一个线程服务器,旨在在具有四核的系统上运行.我能想到的两种线程管理方案是每个客户端连接一个线程和一个排队系统.

正如第一个系统的名称所暗示的那样,我们将为每个连接到我们服务器的客户端生成一个线程.假设一个线程始终专用于我们程序的主执行线程,我们将能够同时处理多达三个客户端,并且对于任何更多的并发客户端,我们将不得不依赖操作系统的抢占式多任务处理功能来切换他们(或虚拟机在绿色线程的情况下).

对于我们的第二种方法,我们将创建两个线程安全队列.一种用于传入消息,一种用于传出消息.换句话说,请求和回复.这意味着我们可能会有一个线程接受传入连接并将它们的请求放入传入队列.一两个线程将处理传入请求,解析适当的回复,并将这些回复放在输出队列中.最后,我们将有一个线程从该队列中取出回复并将它们发送回客户端.

这些方法的优缺点是什么?请注意,我没有提到这是什么类型的服务器.我假设哪个具有更好的性能配置文件取决于服务器是处理像 Web 服务器和 POP3 服务器这样的短连接,还是像 WebSocket 服务器、游戏服务器和消息应用服务器这样的更长连接.

除了这两个之外还有其他线程管理策略吗?

解决方案

我相信我曾经做过这两个组织.

<小时>

方法一

为了让我们在同一页面上,第一个让主线程执行listen.然后,在一个循环中,它执行accept.然后它将返回值传递给 pthread_create 并且客户端线程的循环在循环中执行 recv/send 处理远程客户端想要的所有命令.完成后,它会清理并终止.

有关此示例,请参阅我最近的回答:multi-使用套接字的线程文件传输

这具有主线程和客户端线程直接且独立的优点.没有线程等待另一个线程正在执行的任何操作.没有线程在等待它不需要的任何东西.因此,客户端线程 [复数] 都可以以最大线速度运行.此外,如果一个客户端线程在 recvsend 上被阻塞,而另一个线程可以去,它会.它是自我平衡的.

所有线程循环都很简单:等待输入、处理、发送输出、重复.甚至主线程也很简单:sock = accept, pthread_create(sock), repeat

还有一件事.客户端线程与其远程客户端之间的交互可以是他们同意的任何.任何协议或任何类型的数据传输.

<小时>

方法二

这有点类似于 N 工人模型,其中 N 是固定的.

因为 accept [通常] 是阻塞的,我们需要一个类似于方法 1 的主线程.不同的是,它需要 malloc 一个控件而不是启动一个新线程struct [或其他一些管理方案]并将套接字放入其中.然后将其放在客户端连接列表中,然后循环回到 accept

除了N个工作线程,你是对的.至少两个控制线程,一个做select/pollrecvenqueue request,一个做等待结果select/pollsend.

需要两个线程来防止其中一个线程不得不等待两种不同的事情:各种套接字 [作为一个组] 和来自各种工作线程的请求/结果队列.使用单个控制线程时,所有操作都必须阻塞,并且线程会疯狂旋转.

这是线程外观的[极其]简化版本:

//recv 的控制线程:而 (1) {//(1) 对所有客户端连接套接字进行阻塞轮询以进行读取轮询(...)//(2) 对所有挂起的套接字做一个请求块的接收并入队//在请求队列中for(都在read_mask中){request_buf = dequeue(control_free_list)接收(请求缓冲区);入队(request_list,request_buf);}}//recv 的控制线程:而 (1) {//(1) 在结果队列上做阻塞等待//(2) 查看所有结果队列元素并创建聚合写掩码//从套接字编号中轮询//(3) 对所有客户端连接套接字进行阻塞轮询以进行写入轮询(...)//(4) 对于所有可以写入的挂起套接字for(全部在write_mask中){//从结果队列中找到第一个结果缓冲区并使其出队//匹配给定的客户端result_buf = dequeue(result_list,client_id);发送(请求缓冲区);入队(control_free_list,request_buf);}}//工作线程:而 (1) {//(1) 在请求队列上做阻塞等待request_buf = dequeue(request_list);//(2) 处理请求...//(3) 对所有客户端连接套接字进行阻塞轮询以进行写入入队(result_list,request_buf);}

现在,有几点需要注意.所有工作线程仅使用 一个 请求队列.recv 控制线程没有尝试选择一个空闲 [或未充分利用] 的工作线程并加入线程特定的队列 [这是另一个需要考虑的选项].>

单个请求队列可能是最有效的.但是,也许并不是所有的工作线程都是平等的.有些可能会在具有特殊加速硬件的 CPU 内核 [或集群节点] 上结束,因此有些请求可能必须被发送到特定线程.

而且,如果这样做了,线程可以进行工作窃取"吗?也就是说,一个线程完成了它的所有工作,并注意到另一个线程在它的队列中有一个请求 [这是兼容的] 但尚未启动.线程使请求出列并开始处理它.

此方法有一个很大的缺点.请求/结果块的 [大部分] 大小是固定的.我已经完成了一个实现,其中控件可以有一个字段,用于可以是任意大小的side/extra"有效负载指针.

但是,如果进行大型传输文件传输,无论是上传还是下载,尝试通过请求块传递这个零碎的文件并不是一个好主意.

在下载的情况下,工作线程可以临时占用套接字并在将结果入队到控制线程之前发送文件数据.

但是,对于上传的情况,如果工作人员试图在一个紧凑的循环中进行上传,它将与 recv 控制线程发生冲突.工作线程必须[以某种方式] 提醒控制线程不要将套接字包含在其轮询掩码中.

这开始变得复杂.

而且,所有这些请求/结果块入队/出队都有开销.

另外,两个控制线程是一个热点".系统的整个吞吐量取决于它们.

而且,套接字之间存在交互.在简单的情况下,recv 线程可以在一个套接字上启动,但其他希望发送请求的客户端会延迟到 recv 完成.这是一个瓶颈.

这意味着所有 recv 系统调用都必须是非阻塞的 [异步].控制线程必须管理这些异步请求(即发起一个并等待异步完成通知,然后将请求排入请求队列中).

这开始变得复杂了.

想要这样做的主要好处是拥有大量并发客户端(例如 50,000),但将线程数保持在合理值(例如 100).

这种方法的另一个优点是可以分配优先级并使用多个优先级队列

<小时>

比较和混合

与此同时,方法 1 可以完成方法 2 所做的所有事情,但采用了一种更简单、更健壮的方式 [而且,我怀疑是更高的吞吐量].

创建方法 1 客户端线程后,它可能会拆分工作并创建多个子线程.然后它可以像方法 2 的控制线程一样工作.实际上,它可以像方法 2 一样从固定的 N 池中提取这些线程.

这将弥补方法 1 的弱点,即线程将进行大量计算.由于大量线程都在进行计算,系统会变得不堪重负.排队方法有助于缓解这种情况.客户端线程仍处于创建/活动状态,但它正在结果队列中休眠.

所以,我们只是把水搅得更糊涂了.

任何一种方法都可以是正面"方法,并且在下面具有另一种元素.

给定的客户端线程 [方法 1] 或工作线程 [方法 2] 可以通过打开 [又] 另一个到后台"计算集群的连接来完成其工作.可以使用任一方法管理集群.

因此,方法 1 更简单、更易于实施,并且可以轻松适应大多数工作组合.方法 2 可能更适合大型计算服务器将请求限制到有限资源.但是,必须小心使用方法 2 以避免出现瓶颈.

Let's say we're building a threaded server intended to run on a system with four cores. The two thread management schemes I can think of are one thread per client connection and a queuing system.

As the first system's name implies, we'll spawn one thread per client that connects to our server. Assuming one thread is always dedicated to our program's main thread of execution, we'll be able to handle up to three clients concurrently and for any more simultaneous clients than that we'll have to rely on the operating system's preemptive multitasking functionality to switch among them (or the VM's in the case of green threads).

For our second approach, we'll make two thread-safe queues. One is for incoming messages and one is for outgoing messages. In other words, requests and replies. That means we'll probably have one thread accepting incoming connections and placing their requests into the incoming queue. One or two threads will handle the processing of the incoming requests, resolving the appropriate replies, and placing those replies on the outgoing queue. Finally, we'll have one thread just taking replies off of that queue and sending them back out to the clients.

What are the pros and cons of these approaches? Notice that I didn't mention what kind of server this is. I'm assuming that which one has a better performance profile depends on whether the server handles short connections like a web servers and POP3 servers, or longer connections like a WebSocket servers, game servers, and messaging app servers.

Are there other thread management strategies besides these two?

解决方案

I believe I've done both organizations at one time or another.


Method 1

Just so we're on the same page, the first has the main thread do a listen. Then, in a loop, it does accept. It then passes off the return value to a pthread_create and the client thread's loop does recv/send in loop processing all commands the remote client wants. When done, it cleans up and terminates.

For an example of this, see my recent answer: multi-threaded file transfer with socket

This has the virtues that the main thread and client threads are straightforward and independent. No thread waits on anything another thread is doing. No thread is waiting on anything that it doesn't have to. Thus, the client threads [plural] can all run at maximum line speed. Also, if a client thread is blocked on a recv or send, and another thread can go, it will. It is self balancing.

All thread loops are simple: wait for input, process, send output, repeat. Even the main thread is simple: sock = accept, pthread_create(sock), repeat

Another thing. The interaction between the client thread and its remote client can be anything they agree on. Any protocol or any type of data transfer.


Method 2

This is somewhat akin to an N worker model, where N is fixed.

Because the accept is [usually] blocking, we'll need a main thread that is similar to method 1. Except, that instead of firing up a new thread, it needs to malloc a control struct [or some other mgmt scheme] and put the socket in that. It then puts this on a list of client connections and then loops back to the accept

In addition to the N worker threads, you are correct. At least two control threads, one to do select/poll, recv, enqueue request and one to do wait for result, select/poll, send.

Two threads are needed to prevent one of these threads having to wait on two different things: the various sockets [as a group] and the request/result queues from the various worker threads. With a single control thread all actions would have to be non-blocking and the thread would spin like crazy.

Here is an [extremely] simplified version of what the threads look like:

// control thread for recv:
while (1) {
    // (1) do blocking poll on all client connection sockets for read
    poll(...)

    // (2) for all pending sockets do a recv for a request block and enqueue
    //     it on the request queue
    for (all in read_mask) {
        request_buf = dequeue(control_free_list)
        recv(request_buf);
        enqueue(request_list,request_buf);
    }
}

// control thread for recv:
while (1) {
    // (1) do blocking wait on result queue

    // (2) peek at all result queue elements and create aggregate write mask
    //     for poll from the socket numbers

    // (3) do blocking poll on all client connection sockets for write
    poll(...)

    // (4) for all pending sockets that can be written to
    for (all in write_mask) {
        // find and dequeue first result buffer from result queue that
        // matches the given client
        result_buf = dequeue(result_list,client_id);
        send(request_buf);
        enqueue(control_free_list,request_buf);
    }
}

// worker thread:
while (1) {
    // (1) do blocking wait on request queue
    request_buf = dequeue(request_list);

    // (2) process request ...

    // (3) do blocking poll on all client connection sockets for write
    enqueue(result_list,request_buf);
}

Now, a few things to notice. Only one request queue was used for all worker threads. The recv control thread did not try to pick an idle [or under utilized] worker thread and enqueue to a thread specific queue [this is another option to consider].

The single request queue is probably the most efficient. But, maybe, not all worker threads are created equal. Some may end up on CPU cores [or cluster nodes] that have special acceleration H/W, so some requests may have to be sent to specific threads.

And, if that is done, can a thread do "work stealing"? That is, a thread completes all its work and notices that another thread has a request in its queue [that is compatible] but hasn't been started. The thread dequeues the request and starts working on it.

Here's a big drawback to this method. The request/result blocks are of [mostly] fixed size. I've done an implementation where the control could have a field for a "side/extra" payload pointer that could be an arbitrary size.

But, if doing a large transfer file transfer, either upload or download, trying to pass this piecemeal through request blocks is not a good idea.

In the download case, the worker thread could usurp the socket temporarily and send the file data before enqueuing the result to the control thread.

But, for the upload case, if the worker tried to do the upload in a tight loop, it would conflict with recv control thread. The worker would have to [somehow] alert the control thread to not include the socket in its poll mask.

This is beginning to get complex.

And, there is overhead to all this request/result block enqueue/dequeue.

Also, the two control threads are a "hot spot". The entire throughput of the system depends on them.

And, there are interactions between the sockets. In the simple case, the recv thread can start one on one socket, but other clients wishing to send requests are delayed until the recv completes. It is a bottleneck.

This means that all recv syscalls have to be non-blocking [asynchronous]. The control thread has to manage these async requests (i.e. initiate one and wait for an async completion notification, and only then enqueue the request on the request queue).

This is beginning to get complicated.

The main benefit to wanting to do this is having a large number of simultaneous clients (e.g. 50,000) but keep the number of threads to a sane value (e.g. 100).

Another advantage to this method is that it is possible to assign priorities and use multiple priority queues


Comparison and hybrids

Meanwhile, method 1 does everything that method 2 does, but in a simpler, more robust [and, I suspect, higher throughput way].

After a method 1 client thread is created, it might split the work up and create several sub-threads. It could then act like the control threads of method 2. In fact, it might draw on these threads from a fixed N pool just like method 2.

This would compensate for a weakness of method 1, where the thread is going to do heavy computation. With a large number threads all doing computation, the system would get swamped. The queuing approach helps alleviate this. The client thread is still created/active, but it's sleeping on the result queue.

So, we've just muddied up the waters a bit more.

Either method could be the "front facing" method and have elements of the other underneath.

A given client thread [method 1] or worker thread [method 2] could farm out its work by opening [yet] another connection to a "back office" compute cluster. The cluster could be managed with either method.

So, method 1 is simpler and easier to implement and can easily accomodate most job mixes. Method 2 might be better for heavy compute servers to throttle the requests to limited resources. But, care must be taken with method 2 to avoid bottlenecks.

这篇关于每个客户端一个线程与线程服务器的排队线程模型之间的相对优点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆