在bash中使用多个核心 [英] Use more than one core in bash

查看:47
本文介绍了在bash中使用多个核心的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Linux工具(极大地简化了),它切割了illumnaSeq文件中指定的序列.我有32个文件要研磨.一个文件将在5个小时内处理完毕.我在centos上有一台服务器,它有128个内核.

I have a linux tool that (greatly simplifying) cuts me the sequences specified in illumnaSeq file. I have 32 files to grind. One file is processed in about 5 hours. I have a server on the centos, it has 128 cores.

我找到了一些解决方案,但是每个解决方案的工作方式都只使用一个内核.最后一个似乎触发了32个小节,但是它仍然会用一个核心对整个过程加压.

I've found a few solutions, but each one works in a way that only uses one core. The last one seems to fire 32 nohups, but it'll still pressurize the whole thing with one core.

我的问题是,没有人知道如何利用服务器的潜力吗?因为基本上每个文件都可以独立处理,所以它们之间没有关系.

My question is, does anyone have any idea how to use the server's potential? Because basically every file can be processed independently, there are no relations between them.

这是脚本的当前版本,我不知道为什么它只使用一个内核.我在这里的建议的帮助下编写了它,并在Internet上找到了

This is the current version of the script and I don't know why it only uses one core. I wrote it with the help of advice here on stack and found on the Internet:

#!/bin/bash
FILES=/home/daw/raw/*
count=0

for f in $FILES
to
  base=${f##*/}
  echo "process $f file..."
  nohup /home/daw/scythe/scythe -a /home/daw/scythe/illumina_adapters.fa -o "OUT$base" $f &
  (( count ++ ))
  if (( count = 31 )); then
        wait
        count=0
  fi
done

我正在解释:FILES是原始文件夹中文件的列表.

I'm explaining: FILES is a list of files from the raw folder.

执行nohup的核心"行:第一个路径是工具的路径,-a路径是带有要剪切的模式的文件的路径,out开头与已处理的文件名+ OUT相同.最后一个参数是要处理的输入文件.

The "core" line to execute nohup: the first path is the path to the tool, -a path is the path to the file with paternas to cut, out saves the same file name as the processed + OUT at the beginning. The last parameter is the input file to be processed.

以下自述工具: https://github.com/vsbuffalo/scythe

有人知道你怎么处理吗?

Does anybody know how you can handle it?

P.S.我还尝试过在计数前移动nohup,但是它仍然使用一个内核.我对服务器没有限制.

P.S. I also tried move nohup before count, but it's still use one core. I have no limitation on server.

推荐答案

恕我直言,最可能的解决方案是 GNU Parallel ,因此您最多可以并行运行64个作业,如下所示:

IMHO, the most likely solution is GNU Parallel, so you can run up to say, 64 jobs in parallel something like this:

parallel -j 64 /home/daw/scythe/scythe -a /home/daw/scythe/illumina_adapters.fa -o OUT{.} {} ::: /home/daw/raw/*

这样做的好处是,作业不会被批处理,它可以始终保持64个运行,并在每个作业完成时开始一个新作业,这比在启动最后一个作业之前可能要等待4.9个小时来完成所有32个作业要好之后又需要5个小时.请注意,我在这里随意选择了64个作业,如果没有另外指定, GNU Parallel 将在您拥有的每个CPU内核中运行1个作业.

This has the benefit that jobs are not batched, it keeps 64 running at all times, starting a new one as each job finishes, which is better than waiting potentially 4.9 hours for all 32 of your jobs to finish before starting the last one which takes a further 5 hours after that. Note that I arbitrarily chose 64 jobs here, if you don't specify otherwise, GNU Parallel will run 1 job per CPU core you have.

有用的其他参数是:

  • parallel --bar ... 提供进度条
  • parallel --dry-run ... 进行了空运行,因此您可以看到它实际上将不执行任何操作
  • parallel --bar ... gives a progress bar
  • parallel --dry-run ... does a dry run so you can see what it would do without actually doing anything

如果有多个服务器,则可以将它们添加到列表中,并且 GNU Parallel 也会在其中分配作业:

If you have multiple servers available, you can add them in a list and GNU Parallel will distribute the jobs amongst them too:

parallel -S server1,server2,server3 ...

这篇关于在bash中使用多个核心的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆