GNU Parallel:如何从零开始作业编号替换字符串 [英] GNU Parallel: How to start job number replacement string at zero
问题描述
我对使用GNU并行并将多GB CSV数据库导出文件拆分为可管理的块的速度感到非常满意.但是,我遇到的问题是我希望输出文件名的格式为 some_table.csv.part_0000.csv
和从零开始(导入工具需要这样做).获得"0001"是一个挑战,但是我设法使用printf实现了这一点.我不能减少工作量.
I'm very pleased with the speed of using GNU parallel with splitting multi-GB CSV database export files into manageable chunks. However, the problem I'm having is that I'd like my output file names to be in the format some_table.csv.part_0000.csv
and start at zero (the import tool requires this). Getting "0001" was a challenge, but I managed to use printf to achieve this. I can't get the decrement to work though.
我的命令:
FILE = some_table;并行-v --joblog split.log --pipepart --recend'-EOL \ n'--block 25M"cat> $ FILE.csv.part _ $(printf"%04d"{#}).csv":::: $ FILE.csv
执行诸如表达式扩展( $ FILE.csv.part _ $(({{#}-1)).csv
)之类的操作不起作用是因为 {#}
混淆了内部子外壳. PART = $(({{#}-1));也是如此;猫>$ FILE.csv.part_ $ PART.csv
.
Doing things like expression expansion ($FILE.csv.part_$(({#}-1)).csv
) don't work because {#}
confuses the inner subshell. So does PART=$(({#}-1)); cat > $FILE.csv.part_$PART.csv
.
有什么建议吗?
推荐答案
使用{= =}结构:
FILE=some_table; parallel -v --joblog split.log --pipepart --recend '-- EOL\n' --block 25M "cat > $FILE.csv.part_"'{=$_=sprintf("%04d",$job->seq()-1)=}'".csv" :::: $FILE.csv
如果要大量使用它,则将其放入〜/.parallel/config中,以定义自己的替换字符串:
If you are going to use it a lot then define your own replacement string by putting this into ~/.parallel/config:
--rpl '{0000#} $_=sprintf("%04d",$job->seq()-1)'
然后使用{0000#}:
Then use {0000#}:
seq 11 | parallel echo {0000#}
如果您只希望数字固定宽度(不一定是4位数字):
If you just want the numbers to be fixed width (and not necessarily 4 digits):
--rpl '{0#} $f="%0".int(1+log(total_jobs()-1)/log(10))."d";$_=sprintf($f,$job->seq()-1)'
然后使用{0#}:
seq 11 | parallel echo {0#}
另一方面,为什么要将其保存到文件中呢?为什么不将其直接传递给数据库导入器并使用-retries/-retry-failed
重试失败的块?
On a different note: Why save it to files at all? Why not pass it directly to the database importer and use --retries/--retry-failed
to retry failed chunks?
如果要用于作业位:
parallel --rpl '{0000%} $_=sprintf("%04d",$job->slot())' echo {0000%} ::: {1..100}
您还可以使用动态替换字符串:
You can also use a dynamic replacement string:
--rpl '{(0+?)%} $l=length $$1; $_=sprintf("%0${l}d",$job->slot())'
--rpl '{(0+?)#} $l=length $$1; $_=sprintf("%0${l}d",$job->seq())'
parallel echo {0%} ::: {1..100}
parallel echo {0#} ::: {1..100}
parallel echo {00%} ::: {1..100}
parallel echo {00#} ::: {1..100}
parallel echo {000%} ::: {1..100}
parallel echo {000#} ::: {1..100}
自版本20210222起,您可以执行以下操作:
Since version 20210222 you can do:
parallel --plus echo {0%} ::: {1..100}
parallel --plus echo {0#} ::: {1..100}
这将自动检测所需的前导零.
which will automatically detect the needed leading zeros.
这篇关于GNU Parallel:如何从零开始作业编号替换字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!