哪个更适合node.js上的并发任务?纤维?网络工作者?或线程? [英] Which would be better for concurrent tasks on node.js? Fibers? Web-workers? or Threads?

查看:243
本文介绍了哪个更适合node.js上的并发任务?纤维?网络工作者?或线程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我偶尔在node.js上碰到过,很喜欢。但很快我发现它缺乏执行CPU密集型任务的能力。所以,我开始googling,并得到这些答案解决问题:纤维,网络工作者和线程(线程a-gogo)。现在哪一个使用是一个混乱,其中之一绝对需要使用 - 毕竟,有一个服务器的目的是只是擅长的IO,没有别的?需要建议!

I stumbled over node.js sometime ago and like it a lot. But soon I found out that it lacked badly the ability to perform CPU-intensive tasks. So, I started googling and got these answers to solve the problem: Fibers, Webworkers and Threads (thread-a-gogo). Now which one to use is a confusion and one of them definitely needs to be used - afterall what's the purpose of having a server which is just good at IO and nothing else? Suggestions needed!

更新:

晚了;只需要建议。现在,我想到的是:让我们有一些线程(使用thread_a_gogo或者也许webworkers)。现在,当我们需要更多的,我们可以创造更多。但是在创建过程中会有一些限制。 (不是系统隐含的,但可能是因为开销)。现在,当我们超过限制时,我们可以fork一个新的节点,并开始创建线程。这样,它可以继续,直到我们达到一些限制(毕竟,进程也有很大的开销)。当达到此限制时,我们开始排队任务。每当一个线程变得空闲,它将被分配一个新的任务。这样,它可以顺利进行。

I was thinking of a way off-late; just needing suggestions over it. Now, what I thought of was this: Let's have some threads (using thread_a_gogo or maybe webworkers). Now, when we need more of them, we can create more. But there will be some limit over the creation process. (not implied by the system but probably because of overhead). Now, when we exceed the limit, we can fork a new node, and start creating threads over it. This way, it can go on till we reach some limit (after all, processes too have a big overhead). When this limit is reached, we start queuing tasks. Whenever a thread becomes free, it will be assigned a new task. This way, it can go on smoothly.

所以,这是我想到的。这个想法好吗?我对这个过程和线程的东西有点新,所以没有任何专业知识。请分享您的意见。

So, that was what I thought of. Is this idea good? I am a bit new to all this process and threads stuff, so don't have any expertise in it. Please share your opinions.

谢谢。 :)

推荐答案

Node有一个完全不同的范例,一旦正确捕获,更容易看到这种不同的解决方法问题。你在Node应用程序(1)中不需要多个线程,因为你有不同的方式做同样的事情。您创建多个进程;但它是非常非常不同的,例如Apache Web Server的Prefork mpm是如何。

Node has a completely different paradigm and once it is correctly captured, it is easier to see this different way of solving problems. You never need multiple threads in a Node application(1) because you have a different way of doing the same thing. You create multiple processes; but it is very very different than, for example how Apache Web Server's Prefork mpm does.

现在,让我们认为我们只有一个CPU核心,我们将开发应用程序(以Node的方式)做一些工作。我们的工作是处理一个大文件,逐个字节地运行它的内容。我们的软件最好的方法是从文件开头开始工作,逐个字节到结尾。

For now, let's think that we have just one CPU core and we will develop an application (in Node's way) to do some work. Our job is to process a big file running over its contents byte-by-byte. The best way for our software is to start the work from the beginning of the file, follow it byte-by-byte to the end.

- Hey,Hasan,我想你是我的祖父的时间的新手或很老的学校!为什么不创建一些线程并使其更快?

-- Hey, Hasan, I suppose you are either a newbie or very old school from my Grandfather's time!!! Why don't you create some threads and make it much faster?

- 哦,我们只有一个CPU核心。

-- Oh, we have only one CPU core.

- 那么什么?创建一些线程man,使它更快!

-- So what? Create some threads man, make it faster!

- 它不工作。如果我创建线程,我会使它更慢。因为我将添加很多开销给系统在线程之间切换,试图给他们一个正常的时间,在我的过程中,试图在这些线程之间进行通信。除了所有这些事实,我还将考虑如何将单个工作分成多个可以并行完成的工作。

-- It does not work like that. If I create threads I will be making it slower. Because I will be adding a lot of overhead to the system for switching between threads, trying to give them a just amount of time, and inside my process, trying to communicate between these threads. In addition to all these facts, I will also have to think about how I will divide a single job into multiple pieces that can be done in parallel.

- 好吧,我看你是穷人。让我们用我的电脑,它有32核心!

-- Okay okay, I see you are poor. Let's use my computer, it has 32 cores!

- 哇,你真棒我的亲爱的朋友,非常感谢。非常感谢!

-- Wow, you are awesome my dear friend, thank you very much. I appreciate it!

然后我们回到工作。现在我们有32个CPU内核感谢我们的富有的朋友。我们必须遵守的规则刚刚改变了。现在我们想利用我们给予的所有这些财富。

Then we turn back to work. Now we have 32 cpu cores thanks to our rich friend. Rules we have to abide have just changed. Now we want to utilize all this wealth we are given.

要使用多个核心,我们需要找到一种方法来将我们的工作分成我们可以并行处理的块。如果它不是Node,我们将使用线程; 32个线程,每个cpu内核一个。但是,由于我们有Node,我们将创建32个Node进程。

To use multiple cores, we need to find a way to divide our work into pieces that we can handle in parallel. If it was not Node, we would use threads for this; 32 threads, one for each cpu core. However, since we have Node, we will create 32 Node processes.

线程可以是Node进程的一个很好的替代方法,甚至更好的方法;但只有在特定类型的工作,其中工作已经定义,我们可以完全控制如何处理它。除此之外,对于其他类型的问题,工作来自外部,我们无法控制,我们想尽快回答,Node的方式是不可思议的优越。

Threads can be a good alternative to Node processes, maybe even a better way; but only in a specific kind of job where the work is already defined and we have complete control over how to handle it. Other than this, for every other kind of problem where the job comes from outside in a way we do not have control over and we want to answer as quickly as possible, Node's way is unarguably superior.

- 嘿,Hasan,你还在单线程工作吗?你怎么了,人?我刚刚向你提供了你想要的。你没有任何借口了。

-- Hey, Hasan, are you still working single-threaded? What is wrong with you, man? I have just provided you what you wanted. You have no excuses anymore. Create threads, make it run faster.

- 我将工作分成多个部分,每个进程将并行处理这些部分。

-- I have divided the work into pieces and every process will work on one of these pieces in parallel.

- 为什么不创建主题?

-- Why don't you create threads?

- 对不起,我认为它不可用。你可以带你的电脑吗?

-- Sorry, I don't think it is usable. You can take your computer if you want?

- 没关系,我很酷,我只是不明白为什么你不使用线程?

-- No okay, I am cool, I just don't understand why you don't use threads?

- 谢谢你的电脑。 :)我已经将工作分成几个部分,我创建了并行处理这些部分的过程。所有的CPU内核都将被充分利用。我可以做到这一点线程而不是进程;但Node有这种方式,我的老板Parth Thakkar想要我使用Node。

-- Thank you for the computer. :) I already divided the work into pieces and I create processes to work on these pieces in parallel. All the CPU cores will be fully utilized. I could do this with threads instead of processes; but Node has this way and my boss Parth Thakkar wants me to use Node.

- 好吧,让我知道如果你需要另一台电脑。 :p

-- Okay, let me know if you need another computer. :p

如果我创建33个进程而不是32个进程,操作系统的调度程序将暂停一个线程,启动另一个进程,在一些循环后暂停它,其他一个...这是不必要的开销。我不想要这个。事实上,在有32个内核的系统上,我甚至不想创建32个进程,31可以更好。因为它不只是我的应用程序,将工作在这个系统。留出一点空间,其他的东西可以是很好的,特别是如果我们有32个房间。

If I create 33 processes, instead of 32, the operating system's scheduler will be pausing a thread, start the other one, pause it after some cycles, start the other one again... This is unnecessary overhead. I do not want it. In fact, on a system with 32 cores, I wouldn't even want to create exactly 32 processes, 31 can be nicer. Because it is not just my application that will work on this system. Leaving a little room for other things can be good, especially if we have 32 rooms.

我相信我们现在在同一页上充分利用处理器 CPU密集型任务

I believe we are on the same page now about fully utilizing processors for CPU-intensive tasks.

- 哈姆,对不起,嘲笑你一点。我相信我现在更了解你。但仍有一些我需要解释:什么是所有的嗡嗡声关于运行几百线程?我读到无处不在线程更快创建和哑的比分叉过程?你fork进程而不是线程,你认为它是你将得到的最高的Node。那么Node不适合这种工作吗?

-- Hmm, Hasan, I am sorry for mocking you a little. I believe I understand you better now. But there is still something I need an explanation for: What is all the buzz about running hundreds of threads? I read everywhere that threads are much faster to create and dumb than forking processes? You fork processes instead of threads and you think it is the highest you would get with Node. Then is Node not appropriate for this kind of work?

- 不用担心,我也很酷。每个人都说这些东西,所以我想我习惯听到他们。

-- No worries, I am cool, too. Everybody says these things so I think I am used to hearing them.

- 那么?节点对此不好?

-- So? Node is not good for this?

- 节点是完美的,即使线程也可以很好。至于线程/进程创建开销;在你重复很多,每毫秒计数的事情。但是,我只创建32个进程,这将需要很少的时间。它只会发生一次。

-- Node is perfectly good for this even though threads can be good too. As for thread/process creation overhead; on things that you repeat a lot, every millisecond counts. However, I create only 32 processes and it will take a tiny amount of time. It will happen only once. It will not make any difference.

- 我什么时候要创建数千个主题?

-- When do I want to create thousands of threads, then?

> - 你永远不想创建数千个线程。然而,在一个正在做外部工作的系统上,比如一个处理HTTP请求的Web服务器;如果你为每个请求使用一个线程,你将创建很多线程,其中很多。

-- You never want to create thousands of threads. However, on a system that is doing work that comes from outside, like a web server processing HTTP requests; if you are using a thread for each request, you will be creating a lot of threads, many of them.

- 节点是不同的,但?对吗?

-- Node is different, though? Right?

- 是的。这是节点真正发光的地方。像一个线程比一个进程轻得多,一个函数调用比一个线程轻得多。节点调用函数,而不是创建线程。在Web服务器的示例中,每个传入的请求都会引起函数调用。

-- Yes, exactly. This is where Node really shines. Like a thread is much lighter than a process, a function call is much lighter than a thread. Node calls functions, instead of creating threads. In the example of a web server, every incoming request causes a function call.

- 很有趣;但是如果你不使用多个线程,你只能同时运行一个函数。当很多请求同时到达网络服务器时,如何工作?

-- Hmm, interesting; but you can only run one function at the same time if you are not using multiple threads. How can this work when a lot of requests arrive at the web server at the same time?

- 你完全正确地了解函数如何运行,一次一个,从来不是两个并行。我的意思是在单个进程中,一次只运行一个代码范围。操作系统调度程序不来和暂停此功能并切换到另一个,除非它暂停进程给另一个进程,而不是我们的进程中的另一个线程的时间。 (2)

-- You are perfectly right about how functions run, one at a time, never two in parallel. I mean in a single process, only one scope of code is running at a time. The OS Scheduler does not come and pause this function and switch to another one, unless it pauses the process to give time to another process, not another thread in our process. (2)

- 一个进程如何一次处理两个请求?

-- Then how can a process handle 2 requests at a time?

只要我们的系统有足够的资源(RAM,网络等),进程就可以一次处理数万个请求。

-- A process can handle tens of thousands of requests at a time as long as our system has enough resources (RAM, Network, etc.). How those functions run is THE KEY DIFFERENCE.

- 现在我应该很兴奋吗?

-- Hmm, should I be excited now?

- 也许:)节点在队列上运行循环。在这个队列中是我们的工作,即,我们开始处理传入请求的调用。这里最重要的一点是我们设计我们的函数运行的方式。不是开始处理请求,让调用者等待,直到我们完成工作,我们在做可接受的工作量后,快速结束我们的函数。当我们来到一个点,我们需要等待另一个组件做一些工作,并返回一个值,而不是等待,我们只需完成我们的函数将剩余的工作添加到队列。

-- Maybe :) Node runs a loop over a queue. In this queue are our jobs, i.e, the calls we started to process incoming requests. The most important point here is the way we design our functions to run. Instead of starting to process a request and making the caller wait until we finish the job, we quickly end our function after doing an acceptable amount of work. When we come to a point where we need to wait for another component to do some work and return us a value, instead of waiting for that, we simply finish our function adding the rest of work to the queue.

- 听起来太复杂了?

-- It sounds too complex?

- 没有,我听起来很复杂。但是系统本身很简单,这是完全有道理的。

-- No no, I might sound complex; but the system itself is very simple and it makes perfect sense.

现在我想停止引用这两个开发人员之间的对话,并完成我的答案后最后一个快速示例这些函数如何工作。

Now I want to stop citing the dialogue between these two developers and finish my answer after a last quick example of how these functions work.

这样,我们正在做什么操作系统调度程序通常会做的。我们暂停我们的工作,让其他函数调用(像多线程环境中的其他线程)运行,直到我们再次转向。这比将工作留给OS调度程序好多了,它试图给系统上的每个线程提供时间。我们知道我们做的比OS Scheduler做得好多了,我们应该停止。

In this way, we are doing what OS Scheduler would normally do. We pause our work at some point and let other function calls (like other threads in a multi-threaded environment) run until we get our turn again. This is much better than leaving the work to OS Scheduler which tries to give just time to every thread on system. We know what we are doing much better than OS Scheduler does and we are expected to stop when we should stop.

下面是一个简单的例子,我们打开一个文件并阅读

Below is a simple example where we open a file and read it to do some work on the data.

同步方式:

Open File
Repeat This:    
    Read Some
    Do the work

异步方式:

Open File and Do this when it is ready: // Our function returns
    Repeat this:
        Read Some and when it is ready: // Returns again
            Do some work

如您所见,我们的功能要求系统打开一个文件,不要等待它打开。它完成自己通过提供后文件准备好后的步骤。当我们返回时,Node在队列上运行其他函数调用。在运行所有的函数之后,事件循环移动到下一个循环...

As you see, our function asks the system to open a file and does not wait for it to be opened. It finishes itself by providing next steps after file is ready. When we return, Node runs other function calls on the queue. After running over all the functions, the event loop moves to next turn...

总之,Node有一个完全不同于多线程开发的范例;但这并不意味着它缺乏的东西。对于同步作业(其中我们可以决定处理的顺序和方式),它与多线程并行性一样好。

In summary, Node has a completely different paradigm than multi-threaded development; but this does not mean that it lacks things. For a synchronous job (where we can decide the order and way of processing), it works as well as multi-threaded parallelism. For a job that comes from outside like requests to a server, it simply is superior.

(1)除非你是使用其他语言(如C / C ++)构建库,在这种情况下,您仍然不会创建分割作业的线程。对于这种工作,你有两个线程,其中一个将继续与Node通信,而另一个执行真正的工作。

(1) Unless you are building libraries in other languages like C/C++ in which case you still do not create threads for dividing jobs. For this kind of work you have two threads one of which will continue communication with Node while the other does the real work.

(2)事实上,每个Node进程多线程的原因与第一个脚注中提到的相同。但是这不是1000线程做类似的工作。这些额外的线程用于接受IO事件和处理进程间消息传递。

(2) In fact, every Node process has multiple threads for the same reasons I mentioned in the first footnote. However this is no way like 1000 threads doing similar works. Those extra threads are for things like to accept IO events and to handle inter-process messaging.

@Mark,谢谢你的建设性批评。在Node的范例中,除非队列中的所有其他调用被设计为一个接一个地运行,否则你永远不应该有太长的函数处理。在计算昂贵的任务的情况下,如果我们看完整的图片,我们看到这不是一个问题我们应该使用线程或进程?但是一个问题是我们如何将这些任务以平衡的方式分成子任务,我们可以并行运行它们在系统上使用多个CPU核心?假设我们将在具有8个内核的系统上处理400个视频文件。如果我们想要一次处理一个文件,那么我们需要一个系统来处理同一文件的不同部分,在这种情况下,也许多线程单进程系统将更容易构建,甚至更高效。当需要状态共享/通信时,我们仍然可以通过运行多个进程并在它们之间传递消息来使用Node。正如我前面所说的,Node的多进程方法是以及这种任务中的多线程方法;但不能超过。再次,正如我之前所说的,节点发光的情况是,当我们有这些任务作为输入到多个来源的系统,因为保持许多连接并发在节点比在每线程连接或进程每连接轻得多系统。

@Mark, thank you for the constructive criticism. In Node's paradigm, you should never have functions that takes too long to process unless all other calls in the queue are designed to be run one after another. In case of computationally expensive tasks, if we look at the picture in complete, we see that this is not a question of "Should we use threads or processes?" but a question of "How can we divide these tasks in a well balanced manner into sub-tasks that we can run them in parallel employing multiple CPU cores on the system?" Let's say we will process 400 video files on a system with 8 cores. If we want to process one file at a time, then we need a system that will process different parts of the same file in which case, maybe, a multi-threaded single-process system will be easier to build and even more efficient. We can still use Node for this by running multiple processes and passing messages between them when state-sharing/communication is necessary. As I said before, a multi-process approach with Node is as well as a multi-threaded approach in this kind of tasks; but not more than that. Again, as I told before, the situation that Node shines is when we have these tasks coming as input to system from multiple sources since keeping many connections concurrently is much lighter in Node compared to a thread-per-connection or process-per-connection system.

对于 setTimeout(...,0)有时在一个耗时任务期间给予休息以允许队列中的呼叫具有它们的处理份额。以不同的方式划分任务可以免除这些;但是,这不是真正的黑客,它只是事件队列工作的方式。此外,使用 process.nextTick 为这个目标是更好,因为当你使用 setTimeout ,时间的计算和检查传递将是必要的,而 process.nextTick 就是我们真正想要的:嘿任务,回到队列的末尾,你已经使用你的分享!

As for setTimeout(...,0) calls; sometimes giving a break during a time consuming task to allow calls in the queue have their share of processing can be required. Dividing tasks in different ways can save you from these; but still, this is not really a hack, it is just the way event queues work. Also, using process.nextTick for this aim is much better since when you use setTimeout, calculation and checks of the time passed will be necessary while process.nextTick is simply what we really want: "Hey task, go back to end of the queue, you have used your share!"

这篇关于哪个更适合node.js上的并发任务?纤维?网络工作者?或线程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆