运行多个 nodejs 工作线程:为什么会有这么大的开销/延迟? [英] running multiple nodejs worker threads: why of such a large overhead/latency?

查看：81 发布时间：2021/6/4 20:31:15 node.js linux multithreading cpu-usage

本文介绍了运行多个 nodejs 工作线程:为什么会有这么大的开销/延迟?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在试验 nodejs 工作线程的使用.我遇到了奇怪的延迟时间.

I'm experimenting nodejs worker threads usage. I'm experiencing weird latency elapsed times.

我有一个 main.js 可以快速生成一系列工作线程.
每个 worker.js 执行一次 cpu-boud 计算(生成素数)；

I have a main.js that spawn a rapid sequence of worker threads.
Each worker.js exceute a cpu-boud computation (generate prime numbers);

顺便说一句，generatePrimes() javascipt 函数只是 cpu 绑定计算的演示示例.在我的真实案例中，工作线程是一个绑定 c++ 库的 nodejs 程序(进行语音识别，在 100% CPU 的情况下过去了半秒).

BTW, the generatePrimes() javascipt function is is just a demo example of cpu-bound calculation. In my real case, the worker thread is a nodejs program that bind an c++ library (doing a speech recognition, during half a second elapsed with 100% CPU).

我的 PC 笔记本电脑:Ubuntu 20.04.2 LTS 桌面环境，有 8 个内核:

my PC laptop: Ubuntu 20.04.2 LTS desktop environment, has 8 cores:

$ inxi -C -M
Machine:   Type: Laptop System: HP product: HP Laptop 17-by1xxx v: Type1ProductConfigId serial: <superuser/root required> 
           Mobo: HP model: 8531 v: 17.16 serial: <superuser/root required> UEFI: Insyde v: F.32 date: 12/14/2018 
CPU:       Topology: Quad Core model: Intel Core i7-8565U bits: 64 type: MT MCP L2 cache: 8192 KiB Speed: 700 MHz min/max: 400/4600 MHz Core speeds (MHz): 1: 700 2: 700 3: 700 4: 700 5: 700 6: 700 7: 700 8: 700

$ echo "CPU threads: $(grep -c processor /proc/cpuinfo)"
CPU threads: 8

我体验到当 worker.js 独立运行(单线程)调用函数:generatePrimes 时，计算总共用了 ~8 秒(2, 1e7)

I experienced that computation has a total elapsed of ~8 seconds, when worker.js run independently (single thread) calling function: generatePrimes(2, 1e7)

问题

当我生成多个线程时，例如6 个线程，几乎并行(请参阅下面的代码)，我预计还有大约 8 秒的时间(也许有一个小的开销)，独立于数量产生的线程(它们不是并行运行，有足够的 CPU 内核吗?).相反，我总共经历了更大的可预见的~8 秒.一世总结超过~20秒?！为什么?

以下是使用 time/pidstat 的源代码和一些经过的测量:

Here below source codes and some elapsed measurements using time/pidstat:

main.js

// main.js
const { Worker } = require('worker_threads')
     
  function runThread(workerData) {
   
    return new Promise((resolve, reject) => {
      
      const worker = new Worker('./worker.js', { workerData })
      
      worker.on('message', resolve)
      worker.on('error', reject)
      worker.on('exit', (code) => {
        if (code !== 0)
          reject(new Error(`Worker stopped with exit code ${code}`))
      })
   
    })
   
  }
   
     
  async function main() {
             
    const numThreads = + process.argv[2]
       
    if ( !numThreads || numThreads < 1 ) {
      console.error(`usage: ${process.argv[1]} number_of_threads`)
      process.exit()
    }  

    const min = 2      
    const max = 1e7    
 
    //
    // run multiple threads, in "parallel":  
    //    
    // It simulates a rapid spawn ("parallel") of a specific number of thread computation. 
    // The main thread run numThreads times the same worker thread.  
    //
    // Data results of each thread elaboration is just "done"
    //
   for (let i = 0; i < numeThreads; i++ )
     setImmediate( async () => { console.log( await runThread({min, max}) ) } )
  }     
   
  if (require.main === module)
    main()
   
  module.exports = { runThread }

worker.js

// worker.js
const { threadId, workerData, parentPort } = require('worker_threads')
const { generatePrimes } = require('./generatePrimes')

// take parameters from main/parente thread
const { min, max } = workerData

// synchronous long-running CPU-bound computation
const primes = generatePrimes(min, max)

// communicate result to main thread;
// to avoid any suspect that elapsed times depend on a large amount of data exchange (the primes array in this case),
// the returned data is just a short string. 
parentPort.postMessage( `Done. Thread id: ${threadId}` )

generatePrimes.js

// generatePrimes.js
// long running / CPU-bound calculation

function generatePrimes(start, range) {
  
  const primes = []
  let isPrime = true
  let end = start + range
  
  for (let i = start; i < end; i++) {
    for (let j = start; j < Math.sqrt(end); j++) {
      if (i !== j && i%j === 0) {
        isPrime = false
        break
      }
    }
    if (isPrime) {
      primes.push(i)
    }
    isPrime = true
  }

  return primes
}


function main() {

  const min = 2
  const max = 1e7

  console.log( generatePrimes(min, max) )

}  


if (require.main === module) 
  main()

module.exports = { generatePrimes }

测试

测试 1:没有工作线程 ->已用时间:~8 秒
测试 2:生成 NR.1 线程 ->已用时间:~8 秒
测试 3:生成 NR.6 线程 ->已用时间:~21 秒

测试 1:没有工作线程

generatePrimes.js 独立 ->已用时间:~8 秒

generatePrimes.js standalone -> elapsed: ~8 seconds

$ /usr/bin/time -f "%E" pidstat 1 -u -e node generatePrimes
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx)    22/04/2021  _x86_64_    (8 CPU)

09:19:05      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:19:06     1000    247776   98,02    0,00    0,00    0,00   98,02     5  node
09:19:07     1000    247776  100,00    0,00    0,00    0,00  100,00     5  node
09:19:08     1000    247776  100,00    0,00    0,00    0,00  100,00     5  node
09:19:09     1000    247776  100,00    0,00    0,00    0,00  100,00     5  node
09:19:10     1000    247776  100,00    0,00    0,00    0,00  100,00     5  node
09:19:11     1000    247776  100,00    0,00    0,00    0,00  100,00     5  node
09:19:12     1000    247776  100,00    0,00    0,00    0,00  100,00     5  node
09:19:13     1000    247776  100,00    0,00    0,00    0,00  100,00     5  node
[
    2,   3,   5,   7,  11,  13,  17,  19,  23,  29,  31,  37,
   41,  43,  47,  53,  59,  61,  67,  71,  73,  79,  83,  89,
   97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151,
  157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223,
  227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281,
  283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359,
  367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433,
  439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503,
  509, 521, 523, 541,
  ... 664479 more items
]

Average:     1000    247776   99,75    0,00    0,00    0,00   99,75     -  node
0:08.60

测试 2:生成 NR.1 线程

main.js 生成 nr.1 个线程 ->已用时间:~8 秒(再次)

main.js spawn nr. 1 thread -> elapsed: ~8 seconds (again)

$ /usr/bin/time -f "%E" pidstat 1 -u -e node main 1
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx)    22/04/2021  _x86_64_    (8 CPU)

your machine has 8 cores.


09:21:01      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:21:02     1000    247867   95,00    2,00    0,00    0,00   97,00     3  node
09:21:03     1000    247867  100,00    0,00    0,00    0,00  100,00     3  node
09:21:04     1000    247867  100,00    0,00    0,00    0,00  100,00     3  node
09:21:05     1000    247867  100,00    0,00    0,00    0,00  100,00     3  node
09:21:06     1000    247867  100,00    0,00    0,00    0,00  100,00     3  node
09:21:07     1000    247867  100,00    0,00    0,00    0,00  100,00     3  node
09:21:08     1000    247867  100,00    0,00    0,00    0,00  100,00     3  node
09:21:09     1000    247867  100,00    1,00    0,00    0,00  101,00     3  node
Done. Thread id: 1

Average:     1000    247867   99,38    0,38    0,00    0,00   99,75     -  node
0:08.50

测试 3:生成 NR.6 个线程

多 (6) 个线程.->已用时间:~21 秒(再次)

multiple (6) threads. -> elapsed: ~21 seconds (again)

$ /usr/bin/time -f "%E" pidstat 1 -u -e node main 6
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx)    22/04/2021  _x86_64_    (8 CPU)

your machine has 8 cores.


09:23:38      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:23:39     1000    247946  554,00    1,00    0,00    0,00  555,00     0  node
09:23:40     1000    247946  599,00    1,00    0,00    0,00  600,00     0  node
09:23:41     1000    247946  600,00    1,00    0,00    0,00  601,00     0  node
09:23:42     1000    247946  599,00    0,00    0,00    0,00  599,00     0  node
09:23:43     1000    247946  599,00    1,00    0,00    0,00  600,00     0  node
09:23:44     1000    247946  599,00    0,00    0,00    0,00  599,00     0  node
09:23:45     1000    247946  600,00    0,00    0,00    0,00  600,00     0  node
09:23:46     1000    247946  599,00    2,00    0,00    0,00  601,00     0  node
09:23:47     1000    247946  599,00    0,00    0,00    0,00  599,00     0  node
09:23:48     1000    247946  599,00    0,00    0,00    0,00  599,00     0  node
09:23:49     1000    247946  600,00    1,00    0,00    0,00  601,00     0  node
09:23:50     1000    247946  598,00    1,00    0,00    0,00  599,00     0  node
09:23:51     1000    247946  599,00    2,00    0,00    0,00  601,00     0  node
Done. Thread id: 1
Done. Thread id: 4
09:23:52     1000    247946  430,00    0,00    0,00    0,00  430,00     0  node
09:23:53     1000    247946  398,00    0,00    0,00    0,00  398,00     0  node
09:23:54     1000    247946  399,00    1,00    0,00    0,00  400,00     0  node
09:23:55     1000    247946  398,00    0,00    0,00    0,00  398,00     0  node
09:23:56     1000    247946  399,00    0,00    0,00    0,00  399,00     0  node
09:23:57     1000    247946  396,00    3,00    0,00    0,00  399,00     0  node
09:23:58     1000    247946  399,00    0,00    0,00    0,00  399,00     0  node
Done. Thread id: 5
Done. Thread id: 6
09:23:59     1000    247946  399,00    1,00    0,00    0,00  400,00     7  node
Done. Thread id: 2
Done. Thread id: 3

Average:     1000    247946  522,00    0,71    0,00    0,00  522,71     -  node
0:21.05

为什么我得到 ~20 秒 而不是预期的 ~8 秒?我哪里错了?

Why I got ~20 seconds instead of expected ~8 seconds? Where I'm wrong?

更新

为了清楚起见，我将 cpu 绑定函数 generatePrimes 分离在一个单独的模块中.

I separated cpu-bound function generatePrimes in a separated module, just for clarity.

我添加了更多经过的测试，将线程数从 1 增加到 9.测试表明，经过的时间随着生成的线程数的增加而增加.这对我来说毫无意义:(

I added more elapsed tests, incrementing number of threads from 1 to 9. Tests show that elapsed time increase with the number of spawned threads. Thats makes no sense for me :(

$ /usr/bin/time -f "%E" node main 1

your machine has 8 cores.

Done. Thread id: 1
0:08.86
$ /usr/bin/time -f "%E" node main 2

your machine has 8 cores.

Done. Thread id: 2
Done. Thread id: 1
0:13.96
$ /usr/bin/time -f "%E" node main 3

your machine has 8 cores.

Done. Thread id: 2
Done. Thread id: 1
Done. Thread id: 3
0:16.71
$ /usr/bin/time -f "%E" node main 4

your machine has 8 cores.

Done. Thread id: 3
Done. Thread id: 2
Done. Thread id: 4
Done. Thread id: 1
0:21.87
$ /usr/bin/time -f "%E" node main 5

your machine has 8 cores.

Done. Thread id: 3
Done. Thread id: 2
Done. Thread id: 5
Done. Thread id: 1
Done. Thread id: 4
0:22.20
$ /usr/bin/time -f "%E" node main 6

your machine has 8 cores.

Done. Thread id: 3
Done. Thread id: 4
Done. Thread id: 6
Done. Thread id: 2
Done. Thread id: 5
Done. Thread id: 1
0:23.74
$ /usr/bin/time -f "%E" node main 7

your machine has 8 cores.

Done. Thread id: 3
Done. Thread id: 4
Done. Thread id: 7
Done. Thread id: 2
Done. Thread id: 5
Done. Thread id: 1
Done. Thread id: 6
0:32.00
$ /usr/bin/time -f "%E" node main 8

your machine has 8 cores.

Done. Thread id: 6
Done. Thread id: 3
Done. Thread id: 2
Done. Thread id: 5
Done. Thread id: 1
Done. Thread id: 8
Done. Thread id: 7
Done. Thread id: 4
0:35.92
$ /usr/bin/time -f "%E" node main 9

your machine has 8 cores.

warning: number of requested threads (9) is higher than number of available cores (8)
Done. Thread id: 8
Done. Thread id: 4
Done. Thread id: 6
Done. Thread id: 9
Done. Thread id: 2
Done. Thread id: 3
Done. Thread id: 7
Done. Thread id: 5
Done. Thread id: 1
0:40.27

顺便说一句，相关问题:为什么程序执行时间不同，多次运行同一个程序?

BTW, related question: Why program execution time differs running the same program multiple times?

运行多个 nodejs 工作线程:为什么会有这么大的开销/延迟? [英] running multiple nodejs worker threads: why of such a large overhead/latency?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

运行多个 nodejs 工作线程:为什么会有这么大的开销/延迟? [英] running multiple nodejs worker threads: why of such a large overhead/latency?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭