GNU Parallel:如何从零开始作业编号替换字符串 [英] GNU Parallel: How to start job number replacement string at zero

查看:56
本文介绍了GNU Parallel:如何从零开始作业编号替换字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用GNU并行并将多GB CSV数据库导出文件拆分为可管理的块的速度感到非常满意.但是,我遇到的问题是我希望输出文件名的格式为 some_table.csv.part_0000.csv 从零开始(导入工具需要这样做).获得"0001"是一个挑战,但是我设法使用printf实现了这一点.我不能减少工作量.

I'm very pleased with the speed of using GNU parallel with splitting multi-GB CSV database export files into manageable chunks. However, the problem I'm having is that I'd like my output file names to be in the format some_table.csv.part_0000.csv and start at zero (the import tool requires this). Getting "0001" was a challenge, but I managed to use printf to achieve this. I can't get the decrement to work though.

我的命令:

FILE = some_table;并行-v --joblog split.log --pipepart --recend'-EOL \ n'--block 25M"cat> $ FILE.csv.part _ $(printf"%04d"{#}).csv":::: $ FILE.csv

执行诸如表达式扩展( $ FILE.csv.part _ $(({{#}-1)).csv )之类的操作不起作用是因为 {#} 混淆了内部子外壳. PART = $(({{#}-1));也是如此;猫>$ FILE.csv.part_ $ PART.csv .

Doing things like expression expansion ($FILE.csv.part_$(({#}-1)).csv) don't work because {#} confuses the inner subshell. So does PART=$(({#}-1)); cat > $FILE.csv.part_$PART.csv.

有什么建议吗?

推荐答案

使用{= =}结构:

FILE=some_table;  parallel -v --joblog split.log --pipepart --recend '-- EOL\n' --block 25M "cat > $FILE.csv.part_"'{=$_=sprintf("%04d",$job->seq()-1)=}'".csv" :::: $FILE.csv

如果要大量使用它,则将其放入〜/.parallel/config中,以定义自己的替换字符串:

If you are going to use it a lot then define your own replacement string by putting this into ~/.parallel/config:

--rpl '{0000#} $_=sprintf("%04d",$job->seq()-1)'

然后使用{0000#}:

Then use {0000#}:

seq 11 | parallel echo {0000#}

如果您只希望数字固定宽度(不一定是4位数字):

If you just want the numbers to be fixed width (and not necessarily 4 digits):

--rpl '{0#} $f="%0".int(1+log(total_jobs()-1)/log(10))."d";$_=sprintf($f,$job->seq()-1)'

然后使用{0#}:

seq 11 | parallel echo {0#}

另一方面,为什么要将其保存到文件中呢?为什么不将其直接传递给数据库导入器并使用-retries/-retry-failed 重试失败的块?

On a different note: Why save it to files at all? Why not pass it directly to the database importer and use --retries/--retry-failed to retry failed chunks?

如果要用于作业位:

parallel --rpl '{0000%} $_=sprintf("%04d",$job->slot())' echo {0000%} ::: {1..100}

您还可以使用动态替换字符串:

You can also use a dynamic replacement string:

--rpl '{(0+?)%} $l=length $$1; $_=sprintf("%0${l}d",$job->slot())'
--rpl '{(0+?)#} $l=length $$1; $_=sprintf("%0${l}d",$job->seq())'

parallel echo {0%} ::: {1..100}
parallel echo {0#} ::: {1..100}
parallel echo {00%} ::: {1..100}
parallel echo {00#} ::: {1..100}
parallel echo {000%} ::: {1..100}
parallel echo {000#} ::: {1..100}

自版本20210222起,您可以执行以下操作:

Since version 20210222 you can do:

parallel --plus echo {0%} ::: {1..100}
parallel --plus echo {0#} ::: {1..100}

这将自动检测所需的前导零.

which will automatically detect the needed leading zeros.

这篇关于GNU Parallel:如何从零开始作业编号替换字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆