庆典:在块文件的进程列表 [英] bash: process list of files in chunks

查看:109
本文介绍了庆典:在块文件的进程列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

设置:

我有一些几百个文件,名称类似于 input0.dat input1.dat ,..., input150.dat ,我需要使用某些命令 CMD (这基本上合并所有文件的内容)来处理。在 CMD 作为需要第一选择输出文件名,然后所有输入文件名列表:

I have some hundred files, named something like input0.dat, input1.dat, ..., input150.dat, which I need to process using some command cmd (which basically merges the contents of all files). The cmd takes as first option the output filename and then a list of all input filenames:

./cmd output.dat input1.dat input2.dat [...] input150.dat

问题:

问题是,该脚本只能处理像10档左右,由于内存问题(不要怪我了)。因此,而是采用了庆典通配符扩展像

The problem is that the script can only handle like 10 files or so due to memory issues (don't blame me for that). Thus, instead of using the bash wildcard extension like

./cmd output.dat *dat

我需要做的是这样

I need to do something like

./cmd temp_output0.dat file0.dat file1.dat [...] file9.dat
[...]
./cmd temp_outputN.dat fileN0.dat fileN1.dat [...] fileN9.dat

后来,我可以合并临时输出。

Afterwards I can merge the temporary outputs.

./cmd output.dat output0.dat [...] outputN.dat

我如何脚本这个有效地庆典

How do I script this efficiently in bash?

我试过了,没有成功,例如

I tried, without success, e.g.

for filename in `echo *dat | xargs -n 3`; do [...]; done

问题是,这一次处理一次所有的文件,因为的xargs 的输出线得到连接起来。

The problem is that this again processes all files at once, because the output lines of xargs get concatenated.

编辑:注意,我需要指定输出文件名作为第一个命令行参数调用时 CMD

Note that I need to specify an output filename as first command line argument when calling cmd!

推荐答案

您可以这样做:

i=0
opfiles=
mkfifo /tmp/foo
echo *dat | xargs -n 3 >/tmp/foo&
while read threefiles; do
    ./cmd tmp_output$i.dat $threefiles
    opfiles="$opfiles tmp_output$i.dat"
    ((i++)) 
done </tmp/foo
rm -f /tmp/foo
wait
./cmd output.dat $opfiles
rm $opfiles

您需要使用一个FIFO保持 I 变量值,以及为一组文件的最后串联。

You need to use a fifo to keep the i variable value, as well as for the final concatenation set of files.

如果你愿意,你可以后台 CMD ./ 内部调用,把前的最后一次CMD的调用:

If you want, you can background the inside invocation of ./cmd, put a wait before the last invocation of cmd:

i=0
opfiles=
mkfifo /tmp/foo
echo *dat | xargs -n 3 >/tmp/foo&
while read threefiles; do
    ./cmd tmp_output$i.dat $threefiles&
    opfiles="$opfiles tmp_output$i.dat"
    ((i++)) 
done </tmp/foo
rm -f /tmp/foo
wait
./cmd output.dat $opfiles
rm $opfiles

更新
如果你想避免使用FIFO完全,您可以使用进程替换效仿,于是重写第一个是:

update If you want to avoid using a fifo entirely, you can use process substitution to emulate it, so rewriting the first one as:

i=0
opfiles=()
while read threefiles; do
    ./cmd tmp_output$i.dat $threefiles
    opfiles+=("tmp_output$i.dat")
    ((i++)) 
done < <(echo *dat | xargs -n 3)
./cmd output.dat "${opfiles[@]}"
rm "${opfiles[@]}"

同样避免管道进入,同时,但从重定向读取保持 opfiles 变量后while循环。

Again avoiding piping into the while, but reading from a redirection to keep the opfiles variable after the while loop.

这篇关于庆典:在块文件的进程列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆