gnu parallel并行化for循环 [英] gnu parallel to parallelize a for loop

查看:141
本文介绍了gnu parallel并行化for循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看到有关此主题的几个问题,但是我没有能力将其转换为我的特定问题.我有一个for循环,该循环循环通过子目录,然后在每个目录内的压缩文本文件上执行.sh脚本.我想并行化此过程,但是我在努力并行应用gnu.

I have seen several questions about this topic, but I lack the ability to translate this to my specific problem. I have a for loop that loops through sub directories and then executes a .sh script on a compressed text file inside each directory. I want to parallelize this process, but I'm struggling to apply gnu parallel.

这是我的循环:

for d in ./*/ ; do (cd "$d" && script.sh); done

我知道我需要并行输入一个列表,所以我一直在尝试:

I understand I need to input a list into parallel, so i have been trying this:

ls -d */ | parallel cd && script.sh

虽然这似乎开始,但是当gzip尝试解压缩目录内的txt文件之一时,我收到一条错误消息,说该文件不存在:

While this appears to get started, I get an error when gzip tries to unzip one of the txt files inside the directory, saying the file does not exist:

gzip: *.txt.gz: No such file or directory

但是,当我运行原始的for循环时,除了要花一个世纪才能完成之外,我没有任何问题.另外,使用并行时,我只会收到一次gzip错误,考虑到我有1000多个子目录,这太奇怪了.

However, when I run the original for loop, I have no issues aside from it taking a century to finish. Also, I only get the gzip error once when using parallel, which is so weird considering I have over 1000 sub-directories.

我的问题是:

  1. 如何在我的情况下使用Parallel?我如何并行将.sh脚本的应用程序并行化到其子目录中的数千个文件?即,我的问题的解决方案是什么?我要进步.

  1. How do I get Parallel to work in my case? How do I get parallel to parallelize the application of a .sh script to 1000s of files in their own sub-directories? ie- what is the solution to my problem? I gotta make progress.

我想念什么?语法,循环,错误的脚本?我想学习.

What am I missing? Syntax, loop, bad script? I want to learn.

Parallel实际上是试图并行运行所有这些.sh脚本吗?为什么我没有每个.txt.gz文件都出错?

Is Parallel actually attempting to run all these .sh scripts in parallel? Why dont I get an error for every .txt.gz file?

并行是应用程序的最佳选择吗?还有其他更适合我需要的选择吗?

Is parallel the best option for the application? Is there another option that is better suited to my needs?

推荐答案

两个问题:

  1. 在:

  1. In:

ls -d */ | parallel cd && script.sh

并行的只是cd,而不是script.sh.如果没有错误,则在所有parallel cd作业都运行完之后,script.sh仅执行一次.它与:

what is paralleled is just cd, not script.sh. script.sh is only executed once, after all parallel cd jobs have run, if there was no error. It is the same as:

ls -d */ | parallel cd
if [ $? -eq 0 ]; then script.sh; fi

  • 您不会将目标目录传递给cd.因此,parallel执行的只是cd,它只是将当前目录更改为您的主目录.最终的script.sh在当前目录(从您调用命令的位置)中执行,该目录中可能没有*.txt.gz文件,因此会出现错误.

  • You do not pass the target directory to cd. So, what is executed by parallel is just cd, which just changes the current directory to your home directory. The final script.sh is executed in the current directory (from where you invoked the command) where there are probably no *.txt.gz files, thus the error.

    您可以使用以下方法检查第一个问题的效果:

    You can check yourself the effect of the first problem with:

    $ mkdir /tmp/foobar && cd /tmp/foobar && mkdir a b c
    $ ls -d */ | parallel cd && pwd
    /tmp/foobar
    

    pwd的输出仅打印一次,即使您有多个输入目录也是如此.您可以通过引用命令来修复它,然后使用以下命令检查第二个问题:

    The output of pwd is printed only once, even if you have more than one input directory. You can fix it by quoting the command and then check the second problem with:

    $ ls -d */ | parallel 'cd && pwd'
    /homes/myself
    /homes/myself
    /homes/myself
    

    您应该看到与输入目录一样多的pwd输出,但是它始终是相同的输出:您的主目录.您可以通过使用替换为当前输入的{}替换字符串来解决第二个问题.进行检查:

    You should see as many pwd outputs as there are input directories but it is always the same output: your home directory. You can fix the second problem by using the {} replacement string that is substituted with the current input. Check it with:

    $ ls -d */ | parallel 'cd {} && pwd'
    /tmp/foobar/a
    /tmp/foobar/b
    /tmp/foobar/c
    

    现在,您应该在输出中正确列出所有输入目录.

    Now, you should have all input directories properly listed in the output.

    对于您的特定问题,这应该可以解决:

    For your specific problem this should work:

    ls -d */ | parallel 'cd {} && script.sh'
    

    这篇关于gnu parallel并行化for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆