在参数更改文本xargs的(或GNU并行) [英] Change text in argument for xargs (or GNU Parallel)
问题描述
我有我可以以两种方式运行程序:单端或末端配对模式。下面是语法:
程序<输出目录名称> <输入1> [INPUT2]
在需要输出目录和至少一个输入。如果我想就三个文件运行它,比如说,样品A,B和C,我会使用类似与xargs的或并行的发现:
用户@主持人:〜/单$ LS
sampleA.txt sampleB.txt sampleC.txt用户@主持人:〜/单$发现。 -name样本*| xargs的回声-i程序{}退房手续{}
节目./sampleA.txt-out ./sampleA.txt
节目./sampleB.txt-out ./sampleB.txt
节目./sampleC.txt-out ./sampleC.txt用户@主持人:〜/单$发现。 -name样本*| --dry平行运行程序{}退房手续{}
节目./sampleA.txt-out ./sampleA.txt
节目./sampleB.txt-out ./sampleB.txt
节目./sampleC.txt-out ./sampleC.txt
但是,当我想在配对末端模式下运行程序,我需要给它两个输入。这些相关的文件,但他们不能简单地级联 - 具有运行与两个作为输入的程序。文件被命名为理智,例如,sampleA_1.txt和sampleA_2.txt。
我希望能够在命令行上的东西,如xargs的(或preferably并行)轻松建立这样的:
用户@主持人:〜/ $配对LS
sampleA_1.txt sampleB_1.txt sampleC_1.txt
sampleA_2.txt sampleB_2.txt sampleC_2.txt用户@主持人:〜/ $配对找到。 -name样品* _1.txt| SED / AWK? |平行 ?
节目./sampleA-out ./sampleA_1.txt ./sampleA_2.txt
节目./sampleB-out ./sampleB_1.txt ./sampleB_2.txt
节目./sampleC-out ./sampleC_1.txt ./sampleC_2.txt
在理想情况下,命令脱光_1.txt创建输出目录名(sampleA出,等),但我真的需要能够采取这样的说法,改变_1到_2为第二个输入
我知道这是死的简单用一个脚本 - 我有一个快速的常规前pression替代这样做在Perl。但我很想能够与一个快速班轮做到这一点。
先谢谢了。
我有快速常规的前pression替代这样做在Perl。但我很想能够与一个快速班轮做到这一点。
块引用>Perl有单行,也正如
SED
和AWK
做的。你可以写:找到。 -name样品* _1.txt| perl的-pe的/ _1 \\ .TXT $ //'|并行程序{}退房手续{} _1.txt {} _2.txt
(即
-e
标志的意思是下一个参数是程序文本;在-p
标记手段该计划应在循环运行;对于输入的每一行,设置$ _
来该行,然后运行该程序,然后打印$ _
)。I have a program that I can run in two ways: single-end or paired-end mode. Here's the syntax:
program <output-directory-name> <input1> [input2]
Where the output directory and at least one input is required. If I wanted to run this on three files, say, sample A, B, and C, I would use something like find with xargs or parallel:
user@host:~/single$ ls sampleA.txt sampleB.txt sampleC.txt user@host:~/single$ find . -name "sample*" | xargs -i echo program {}-out {} program ./sampleA.txt-out ./sampleA.txt program ./sampleB.txt-out ./sampleB.txt program ./sampleC.txt-out ./sampleC.txt user@host:~/single$ find . -name "sample*" | parallel --dry-run program {}-out {} program ./sampleA.txt-out ./sampleA.txt program ./sampleB.txt-out ./sampleB.txt program ./sampleC.txt-out ./sampleC.txt
But when I want to run the program in "paired-end" mode, I need to give it two inputs. These are related files, but they can't simply be concatenated - you have to run the program with both as inputs. Files are named sensibly, e.g., sampleA_1.txt and sampleA_2.txt.
I want to be able to create this easily on the command line with something like xargs (or preferably parallel):
user@host:~/paired$ ls sampleA_1.txt sampleB_1.txt sampleC_1.txt sampleA_2.txt sampleB_2.txt sampleC_2.txt user@host:~/paired$ find . -name "sample*_1.txt" | sed/awk? | parallel ? program ./sampleA-out ./sampleA_1.txt ./sampleA_2.txt program ./sampleB-out ./sampleB_1.txt ./sampleB_2.txt program ./sampleC-out ./sampleC_1.txt ./sampleC_2.txt
Ideally, the command would strip off the _1.txt to create the output directory name (sampleA-out, etc), but I really need to be able to take that argument and change the _1 to a _2 for the second input.
I know this is dead simple with a script - I did this in Perl with a quick regular expression substitution. But I would love to be able to do this with a quick one-liner.
Thanks in advance.
解决方案I did this in Perl with a quick regular expression substitution. But I would love to be able to do this with a quick one-liner.
Perl has one-liners, too, just as
sed
andawk
do. You can write:find . -name "sample*_1.txt" | perl -pe 's/_1\.txt$//' | parallel program {}-out {}_1.txt {}_2.txt
(The
-e
flag means "the next argument is the program text"; the-p
flag means "the program should be run in loop; for each line of input, set$_
to that line, then run the program, then print$_
".)这篇关于在参数更改文本xargs的(或GNU并行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!