Shell脚本用于多线程的过程 [英] Shell Script for multithreading a process

查看:246
本文介绍了Shell脚本用于多线程的过程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一名生物信息学家,最近陷在其中需要一些脚本,以加快我的过程中存在问题。我们有一个软件称为相位和命令,我在命令行中键入火软件

I am a Bioinformatician and recently stuck in a problem which requires some scripting to speed up my process. We have a software called PHASE and Command that i type in my command line to fire software is

./PHASE test.inp test.out

,其中相是程序的名称和test.ip是输入文件和test.out是输出file.It采用一个芯来运行上述过程需要约3小时,以完成

where PHASE is the name of the program and test.ip is the input file and test.out is the output file.It takes one core to run the above process which takes approx 3 hours to complete.

现在我有输入文件1000说test1.inp,test2.inp,test3.inp .....等方面test1000.inp并希望生成所有1000的输出files..test1.out,TEST2。出..... test100.out用我的系统有4个核心,满负荷

Now i have 1000 of input files say test1.inp,test2.inp,test3.inp.....and so on to test1000.inp and want to generate all 1000 output files..test1.out,test2.out.....test100.out using full capacity of my system which has 4 cores.

要使用我的系统的全部能力我想火上面的脚本,需要4输入文件这样的实例,4 ...并生成4个不同的输出

To use full capacity of my system I want to fire 4 instance of the above script that takes 4 input files like this...and generate 4 different outputs

./PHASE test1.inp test1.out
./PHASE test2.inp test2.out
./PHASE test3.inp test3.out
./PHASE test4.inp test4.out

在每个作业完成,输出文件已经生成的脚本应该再次火起来的其余输入文件,直到所有超过..

After each job is finished and output file has been generated the script should again fire up the remaining input files until all are over..

./PHASE test5.inp test5.out
./PHASE test6.inp test6.out
./PHASE test7.inp test7.out
./PHASE test8.inp test8.out 

等等.....

and so on.....

该怎么写脚本上面的过程,其中的脚本需要4个核心的优势,加快我的过程。

How to write the script for the above process where the script takes advantage of 4 cores and speed up my process.

推荐答案

如果你有GNU xargs的,考虑是这样的:

If you have GNU xargs, consider something like:

printf '%s\0' *.inp | xargs -0 -P 4 -n 1 \
  sh -c 'for f; do ./PHASE "$f" "${f%.inp}.out"' _

-P 4 是重要这里,指示的进程数,以并行运行。

The -P 4 is important here, indicating the number of processes to run in parallel.

如果你有一个非常大量的投入,他们是快处理,考虑更换 -n 1 带数量较多,增加每一个输入的数量shell实例遍历 - 降低shell启动成本,同时也降低了粒度和潜在的,并行的水平

If you have a very large number of inputs and they're fast to process, consider replacing -n 1 with a larger number, to increase the number of inputs each shell instance iterates over -- decreasing shell startup costs, but also reducing granularity and, potentially, level of parallelism.

这是说,如果你的真正的想要做的四批(根据您的问题),让所有四个完成启动接下来的四个(其中引入了一些低效过,但是的你要的),你可以做这样的事情...

That said, if you really want to do batches of four (per your question), letting all four finish before starting the next four (which introduces some inefficiency, but is what you asked for), you could do something like this...

set -- *.inp                # set $@ to list of files matching *.imp
while (( $# )); do          # until we exhaust that list...
  for ((i=0; i<4; i++)); do # loop over batches of four...
    # as long as there's a next argument, start a process for it, and take it off the list
    [[ $1 ]] && ./PHASE "$1" "${1%.imp}.out" & shift
  done
  wait                      # ...and wait for running processes to finish before proceeding
done

这篇关于Shell脚本用于多线程的过程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆