I / O密集​​型和CPU限制 [英] I/O bound and CPU bound

查看:159
本文介绍了I / O密集​​型和CPU限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

喜。

我用 Node.js的 child_process 产卵的bash进程。我想了解,如果我做I / O密集​​型,CPU密集型或两者兼而有之。

I'm using Node.JS with child_process to spawn bash processes. I'm trying to understand if i'm doing I/O bound, CPU bound or both.

我使用 pdftotext 提取文本的 10K + 文件。为了控制concurrences,我使用异步

I'm using pdftotext to extract the text of 10k+ files. To control concurrences, I'm using async.

code:

let spawn = require('child_process').spawn;
let async = require('async');
let files = [
  {
    path: 'path_for_file'
    ...
  },
  ...
];
let maxNumber = 5;

async.mapLimit(files, maxNumber, (file, callback) => {
  let process = child_process.spawn('pdftotext', [
    "-layout",
    "-enc",
    "UTF-8",
    file.path,
    "-"
  ]);
  let result = '';
  let error = '';

  process.stdout.on('data', function(chunk) {
    result += chunk.toString();
  });

  process.stderr.on('error', function(chunk) {
    error += chunk.toString();
  });

  process.on('close', function(data) {
    if (error) {
      return callback(error, null);
    }
    callback(null, result);
  });


}, function(error, files) {
  if (error) {
    throw new Error(error);
  }

  console.log(files);
});

我在监视我的Ubuntu的使用情况和我的CPU和内存都非常高,当我运行程序,也有时我只看到被同时处理一个文件,这是正常的?可能是什么问题?

I'm monitoring my Ubuntu usage and my CPU and Memory are very high when i run the program, and also sometimes I see only one file being processed at a time, is this normal?? What could be the problem??

我想了解child_process的概念。为 pdftotext Node.js的的一个子进程?所有子进程正在运行只在一个核心是什么?而且,我怎样才能让更多的软我的电脑过程的文件?

I'm trying to understand the concept of child_process. Is pdftotext a child process of Node.JS? All child processes are running only in one core? And, how can i make more soft for my computer process the files?

glancer的酷形象:

Cool image of glancer:

在这里输入的形象描述

时因为child_process的??

Is this usage of Node.JS because of the child_process's??

在这里输入的形象描述

感谢。

推荐答案

如果你的工作是CPU饿了,然后作业运行的最佳数目通常是核心数量(或双击,如果CPU具有超线程)。所以,如果你有一个4核的机器,你通常会通过并行运行4就业看到最佳速度。

If your jobs are CPU hungry, then the optimal number of jobs to run is typically the number of cores (or double that if the CPUs have hyperthreading). So if you have a 4 core machine you will typically see the optimal speed by running 4 jobs in parallel.

然而,现代的CPU在很大程度上依赖于高速缓存。这使得它很难predict作业的最佳数量并行运行。再加上从磁盘上的延迟,这将使它更难。

However, modern CPUs are heavily dependent on caches. This makes it hard to predict the optimal number of jobs to run in parallel. Throw in the latency from disks and it will make it even harder.

我甚至看到其中核心共享CPU高速缓存和系统工作的地方是运行更快,一次一个作业 - 仅仅是因为它可以再使用完整的CPU缓存

I have even seen jobs on systems in which the cores shared the CPU cache, and where it was faster to run a single job at a time - simply because it could then use the full CPU cache.

由于那次经历我的建议一直是:测量

Due to that experience my advice has always been: Measure.

所以,如果你有10K作业运行,然后尝试运行具有不同数量的并行作业100个随机的工作,看看有什么最佳数量是给你的。随机选择,所以你也可以得到衡量磁盘I / O是非常重要的。如果文件大小相差很大,运行测试几次。

So if you have 10k jobs to run, then try running 100 random jobs with different number of jobs in parallel to see what the optimal number is for you. It is important to choose at random, so you also get to measure the disk I/O. If the files differ greatly in size, run the test a few times.

find pdfdir -type f > files
mytest() {
  shuf files | head -n 100 |
    parallel -j $1 pdftotext -layout -enc UTF-8 {} - > out;
}
export -f mytest
# Test with 1..10 parallel jobs. Sort by JobRuntime.
seq 10 | parallel -j1 --joblog - mytest | sort -nk 4

不要担心运行在100%的CPU的。这只是意味着你得到所有你在电脑卖场花的钱回报。

Do not worry about your CPUs running at 100%. That just means you get getting a return for all the money you spent at the computer store.

您的RAM仅当磁盘高速缓存得到低的问题(在你的屏幕截图754M不低当它得到< 100M是低的。),因为这可能会导致计算机开始交换 - 这可以将它减慢至爬行。

Your RAM is only a problem if the disk cache gets low (In your screenshot 754M is not low. When it gets < 100M it is low), because that may cause your computer to start swapping - which can slow it to a crawl.

这篇关于I / O密集​​型和CPU限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆