纯粹的Haskell代码需要线程池吗? [英] Are thread pools needed for pure Haskell code?

查看:104
本文介绍了纯粹的Haskell代码需要线程池吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

真实世界Haskell,第28章,软件事务内存中,并发的Web链接检查器被开发。它获取网页中的所有链接,并通过HEAD请求点击每个链接,以确定链接是否处于活动状态。采用并发的方法来构建这个程序,并做出以下声明:


我们不能简单地为每个URL创建一个线程,因为这可能会使我们的CPU或我们的网络连接负担过重,如果(正如我们所预期的那样)大部分链接都是实时且响应迅速的。相反,我们使用固定数量的工作线程,它从URL队列中获取URL。


我不完全明白为什么会这样需要使用线程池,而不是为每个链接使用 forkIO 。 AFAIK,Haskell运行时维护一个线程池并且适当地调度它们,所以我没有看到CPU被超载。此外,在关于Haskell邮件列表上的并发性的讨论中,我发现了以下声明同样的方向:


Haskell中没有任何意义的一个范例是工作线程(因为RTS的确是
为了我们);


只有网络部分需要线程池,否则不需要获取工作者它的CPU原因呢? 解决方案

我想,核心问题是网络方面。如果每个链接都有10,000个链接和forkIO,那么您可能会有一次尝试打开的10,000个套接字,这取决于您的操作系统如何配置,可能甚至无法实现,效率更低。

即使一个问题是平行的,它几乎是从来没有正确的想法是尽可能小的分解,并尝试一次完成。


In Real World Haskell, Chapter 28, Software transactional memory, a concurrent web link checker is developed. It fetches all the links in a webpage and hits every once of them with a HEAD request to figure out if the link is active. A concurrent approach is taken to build this program and the following statement is made:

We can't simply create one thread per URL, because that may overburden either our CPU or our network connection if (as we expect) most of the links are live and responsive. Instead, we use a fixed number of worker threads, which fetch URLs to download from a queue.

I do not fully understand why this pool of threads is needed instead of using forkIO for each link. AFAIK, the Haskell runtime maintains a pool of threads and schedules them appropriately so I do not see the CPU being overloaded. Furthermore, in a discussion about concurrency on the Haskell mailing list, I found the following statement going in the same direction:

The one paradigm that makes no sense in Haskell is worker threads (since the RTS does that for us); instead of fetching a worker, just forkIO instead.

Is the pool of threads only required for the network part or there is a CPU reason for it too?

解决方案

The core issue, I imagine, is the network side. If you have 10,000 links and forkIO for each link, then you potentially have 10,000 sockets you're attempting to open at once, which, depending on how your OS is configured, probably won't even be possible, much less efficient.

However, the fact that we have green threads that get "virtually" scheduled across multiple os threads (which ideally are stuck to individual cores) doesn't mean that we can just distribute work randomly without regards to cpu usage either. The issue here isn't so much that the scheduling of the CPU itself won't be handled for us, but rather that context-switches (even green ones) cost cycles. Each thread, if its working on different data, will need to pull that data into the cpu. If there's enough data, that means pulling things in and out of the cpu cache. Even absent that, it means pulling things from the cache to registers, etc.

Even if a problem is trivially parallel, it is virtually never the right idea to just break it up as small as possible and attempt to do it "all at once".

这篇关于纯粹的Haskell代码需要线程池吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆