并行运行可指定数量的命令-对比xargs -P,GNU并行和"moreutils"平行线 [英] Run a specifiable number of commands in parallel - contrasting xargs -P, GNU parallel, and "moreutils" parallel
问题描述
我正在尝试以bash脚本在26台服务器上运行多个mongodump.
I'm trying to run multiple mongodump's on 26 servers in a bash script.
我可以运行3条命令,例如
I can run 3 commands like
mongodump -h staging .... &
mongodump -h production .... &
mongodump -h web ... &
mongodump -h staging .... &
mongodump -h production .... &
mongodump -h web ... &
同时,当一个完成时,我想启动另一个mongodump.
at the same time, and when one finishes I want to start another mongodump.
我不能同时运行所有26个mongodumps命令,服务器将在CPU上用尽.同时最多3个mongodumps.
I can't run all 26 mongodumps commands at the same time, the server will run out on CPU. Max 3 mongodumps at the same time.
推荐答案
您可以使用 xarg
的-P
选项来并行运行一定数量的调用:
请注意,-P
选项为 POSIX未强制要求,但GNU xargs
和BSD/macOS xargs
都支持.
Note that the -P
option is not mandated by POSIX, but both GNU xargs
and BSD/macOS xargs
support it.
xargs -P 3 -n 1 mongodump -h <<<'staging production web more stuff and so on'
这将并行运行mongodump -h staging
,mongodump -h production
和mongodump -h web
,等待所有3个调用完成,然后继续执行mongodump -h more
,mongodump -h stuff
和mongodump -h and
,依此类推.
This runs mongodump -h staging
, mongodump -h production
, and mongodump -h web
in parallel, waits for all 3 calls to finish, then continues with mongodump -h more
, mongodump -h stuff
, and mongodump -h and
, and so on.
-n 1
从输入流中获取一个单个参数并调用mongodump
;根据需要进行调整,必要时在输入中使用单引号或双引号.
-n 1
grabs a single argument from the input stream and calls mongodump
; adjust as needed, single- or double-quoting arguments in the input if necessary.
注意: GNU xargs
-但不支持BSD xargs
-支持-P 0
,其中0
表示:同时运行尽可能多的进程."
Note: GNU xargs
- but not BSD xargs
- supports -P 0
, where 0
means: "run as many processes as possible simultaneously."
默认情况下,通过stdin提供的参数被追加到指定命令.
如果您需要控制 where ,则将相应的参数放在结果命令中,
By default, the arguments supplied via stdin are appended to the specified command.
If you need to control where the respective arguments are placed in the resulting commands,
- 逐行提供参数
- 使用
-I {}
进行指示,并将{}
定义为每个输入行的占位符.
- provide the arguments line by line
- use
-I {}
to indicate that, and to define{}
as the placeholder for each input line.
xargs -P 3 -I {} mongodump -h {} after <<<$'staging\nproduction\nweb\nmore\nstuff'
现在,每个输入参数都替换为{}
,从而允许后面添加参数after
.
Now each input arguments is substituted for {}
, allowing argument after
to come after.
但是请注意,每条输入行总是作为单个参数传递.
Note, however, that each input line is invariably passed as a single argument.
BSD/macOS xargs
允许您将-n
与-J {}
组合在一起,而无需提供基于行的输入,但是GNU xargs
不支持-J
.
简而言之:只有BSD/macOS允许您将输入参数的放置与一次读取多个参数结合起来.
BSD/macOS xargs
would allow you to combine -n
with -J {}
, without needing to provide line-based input, but GNU xargs
doesn't support -J
.
In short: only BSD/macOS allows you to combine placement of the input arguments with reading multiple arguments at once.
请注意,xargs
不会不并行化命令的标准输出并行输出,以便并行处理的输出可以交错 到达.
使用 GNU parallel
可以避免此问题-参见下文.
Note that xargs
does not serialize stdout output from commands in parallel, so that output from parallel processes can arrive interleaved.
Use GNU parallel
to avoid this problem - see below.
xargs
的优势在于它是标准实用程序,因此在支持-P
的平台上没有先决条件.
xargs
has the advantage of being a standard utility, so on platforms where it supports -P
, there are no prerequisites.
在Linux世界中(尽管也通过 Homebrew 在macOS上),有两个专门构建的实用程序,用于在Windows中运行命令并行,不幸的是,它们具有相同的名称; 通常,您必须按需安装它们:
In the Linux world (though also on macOS via Homebrew) there are two purpose-built utilities for running commands in parallel, which, unfortunately, share the same name; typically, you must install them on demand:
-
parallel
(二进制)-请参见其主页.
moreutils
软件包中的parallel
(a binary) from themoreutils
package - see its home page.
功能更强大-来自parallel
包的GNU parallel
(Perl脚本)谢谢,其主页. /p>
The - much more powerful - GNU parallel
(a Perl script) from the parallel
package Thanks, twalberg. - see its home page.
如果您已经具有parallel
实用程序,则parallel --version
会告诉您它是哪个实用程序(GNU parallel
报告版本号和版权信息,"moreutils" parallel
抱怨无效的选项并显示一个语法摘要).
If you already have a parallel
utility, parallel --version
will tell you which one it is (GNU parallel
reports a version number and copyright information, "moreutils" parallel
complains about an invalid option and shows a syntax summary).
parallel -j 3 -n 1 mongodump -h -- staging production web more stuff and so on
# Using -i to control placement of the argument, via {}
# Only *1* argument at at time supported in that case.
parallel -j 3 -i mongodump -h {} after -- staging production web more stuff and so on
与xargs
不同,此parallel
实现不采用从 stdin 传递的参数;所有传递参数必须在--
之后在命令行中传递.
Unlike xargs
, this parallel
implementation doesn't take the arguments to pass through from stdin; all pass-through arguments must be passed on the command line, following --
.
据我所知,此parallel
实现提供的唯一功能超出了xargs
的功能:
From what I can tell, the only features this parallel
implementation offers beyond what xargs
can do is:
-
-l
选项允许延迟进一步的调用,直到系统负载超出指定的阈值以下为止. - 可能是这样(来自
man
页):"stdout和stderr是通过相应的内部管道进行序列化的,以防止烦人的并发输出行为.",尽管我发现这不是 在man
页的日期为2009-07-2的版本中就是这种情况-请参阅最后一节.
- The
-l
option allows delaying further invocations until the system load overage is below the specified threshold. - Possibly this (from the
man
page): "stdout and stderr is serialised through a corresponding internal pipe, in order to prevent annoying concurrent output behaviour.", though I've found this not be the case in the version whoseman
page is dated 2009-07-2 - see last section.
向 Ole Tange 求助.
Tip of the hat to Ole Tange for his help.
parallel -P 3 -n 1 mongodump -h <<<$'staging\nproduction\nweb\nmore\nstuff\nand\nso\non'
# Alternative, using ::: followed by the target-command arguments.
parallel -P 3 -n 1 mongodump -h ::: staging production web more stuff and so on
# Using -n 1 and {} to control placement of the argument.
# Note that using -N rather than -n would allow per-argument placement control
# with {1}, {2}, ...
parallel -P 3 -n 1 mongodump -h {} after <<<$'staging\nproduction\nweb\nmore\nstuff\nand'
-
与
xargs
一样,直通参数通过 stdin 提供,但是GNUparallel
还支持在可配置的分隔符(:::
默认情况下.)As with
xargs
, pass-through arguments are supplied via stdin, but GNUparallel
also supports placing them on the command line, after a configurable separator (:::
by default).与
xargs
不同,每个输入 line 被视为单个参数.Unlike with
xargs
, each input line is considered a single argument.注意事项:如果您的命令涉及到用引号引起来的字符串,则必须使用
-q
将它们作为不同的参数传递;例如parallel -q sh -c 'echo $0' ::: there
仅适用于-q
.Caveat: If your command involves quoted strings, you must use
-q
to pass them through as distinct arguments; e.g.,parallel -q sh -c 'echo $0' ::: there
only works with-q
.与 GNU
xargs
一样,您可以使用-P 0
一次运行尽可能多的 调用,从而充分利用机器的功能,根据Ole的说法,直到GNU Parallel达到极限(文件句柄和进程)".As with GNU
xargs
, you can use-P 0
to run as many invocations as possible at once, taking full advantage of the machine's capabilities, meaning, according to Ole, "until GNU Parallel hits a limit (file handles and processes)".- 方便地,省略
-P
不会像其他实用程序那样一次运行一次 进程,而是每个CPU内核运行一个进程 .
- Conveniently, omitting
-P
doesn't just run one process at a time, as the other utilities do, but runs one process per CPU core.
默认情况下,并行执行的命令的输出会根据每个进程自动进行序列化(分组),以避免交错输出.
Output from commands being executed in parallel is by default automatically serialized (grouped) on per-process basis, to avoid interleaved output.
- 通常这是理想的选择,但请注意,这意味着只有第一个创建输出的命令已终止,您才开始看到 other 命令的输出.
- 使用选项
--line-buffer
(在最新版本中为--lb
)选择退出此行为或
-u
(--ungroup
)甚至允许一条输出线也可以混合来自不同进程的输出;有关详细信息,请参见手册.
- This is generally desirable, but note that it means that you'll only start to see the other commands' output once the first one that has created output has terminated.
- Use option
--line-buffer
(--lb
in more recent versions) to opt out of this behavior or
-u
(--ungroup
) to allow even a single output line to mix output from different processes; see the manual for details.
GNU
parallel
被设计为更好地替代xargs
,它提供了更多功能:一个显着的例子是GNU
parallel
, which is designed to be a better successor toxargs
, offers many more features: a notable example is the ability to perform sophisticated transformations on the pass-through arguments, optionally based on Perl regular expressions; see also:man parallel
andman parallel_tutorial
.以下命令测试
xargs
和两个parallel
实现如何处理并行运行的命令的交错输出-它们是否在到达时显示输出,或尝试对其进行序列化:The following commands test how
xargs
and the twoparallel
implements deal with interleaved output from commands being run in parallel - whether they show output as it arrives, or try to serialize it:有 2个序列化级别,这两个过程都会带来开销:
There are 2 levels of serialization, both of which introduce overhead:
-
行级序列化:防止来自不同进程的 partial 行在单个输出行上混合.
进程级序列化:确保将给定进程的所有输出行组合在一起.
这是最人性化的方法,但请注意,这意味着只有第一个创建输出的命令开始依次显示 other 命令的输出(按顺序).终止.Process-level serialization: Ensure that all output lines from a given process are grouped together.
This is the most user-friendly method, but note that it means that you'll only start to see the other commands' output (in sequence) once the first one that has created output has terminated.据我所知,只有GNU
parallel
提供了 any 序列化(尽管日期为2009-07-2的"moreutils"parallel
手册页说了什么) [1] ),它同时支持两种 方法.From what I can tell, only GNU
parallel
offers any serialization (despite what the "moreutils"parallel
man page dated 2009-07-2 says[1] ), and it supports both methods.下面的命令假定存在以下内容的可执行脚本
./tst
:The commands below assume the existence of executable script
./tst
with the following content:#!/usr/bin/env bash printf "$$: [1/2] entering with arg(s): $*" sleep $(( $RANDOM / 16384 )) printf " $$: [2/2] finished entering\n" echo " $$: stderr line" >&2 echo "$$: stdout line" sleep $(( $RANDOM / 8192 )) echo " $$: exiting"
xargs
(GNU和BSD/macOS实施,如在Ubuntu 16.04和macOS 10.12上找到的):
xargs
(both the GNU and BSD/macOS implementations, as found on Ubuntu 16.04 and macOS 10.12):不进行序列化:单个输出行可以包含多个进程的输出.
No serialization happens: a single output line can contain output from multiple processes.
$ xargs -P 3 -n 1 ./tst <<<'one two three' 2593: [1/2] entering with arg(s): one2594: [1/2] entering with arg(s): two 2593: [2/2] finished entering 2593: stderr line 2593: stdout line 2596: [1/2] entering with arg(s): three 2593: exiting 2594: [2/2] finished entering 2594: stderr line 2594: stdout line 2596: [2/2] finished entering 2596: stderr line 2596: stdout line 2594: exiting 2596: exiting
"moreutils"
parallel
(其man
页面日期为2009-07-02的版本)
"moreutils"
parallel
(version whoseman
page is dated 2009-07-02)不进行序列化:单个输出行可以包含多个进程的输出.
No serialization happens: a single output line can contain output from multiple processes.
$ parallel -j 3 ./tst -- one two three 3940: [1/2] entering with arg(s): one3941: [1/2] entering with arg(s): two3942: [1/2] entering with arg(s): three 3941: [2/2] finished entering 3941: stderr line 3941: stdout line 3942: [2/2] finished entering 3942: stderr line 3942: stdout line 3940: [2/2] finished entering 3940: stderr line 3940: stdout line 3941: exiting 3942: exiting
GNU
parallel
(版本20170122)
GNU
parallel
(version 20170122)默认情况下发生进程级序列化(分组). 使用
--line-buffer
(在较新版本中为--lb
)来选择行级序列化,或者使用-u
退出任何类型的序列化 (--ungroup
).Process-level serialization (grouping) happens by default. Use
--line-buffer
(--lb
in newer versions) to choose line-level serialization instead, or opt out of any kind of serialization with-u
(--ungroup
).请注意,在每组中stderr输出如何在stemout输出之后 (而版本20170122随附的手册页声称stderr输出首先出现 ).
Note how, in each group, stderr output comes after stdout output (whereas the man page that comes with version 20170122 claims that stderr output comes first).
$ parallel -P 3 ./tst ::: one two three 2544: [1/2] entering with arg(s): one 2544: [2/2] finished entering 2544: stdout line 2544: exiting 2544: stderr line 2549: [1/2] entering with arg(s): three 2549: [2/2] finished entering 2549: stdout line 2549: exiting 2549: stderr line 2546: [1/2] entering with arg(s): two 2546: [2/2] finished entering 2546: stdout line 2546: exiting 2546: stderr line
[1]"stdout和stderr通过相应的内部管道进行序列化,以防止烦人的并发输出行为."
告诉我我是否缺少什么.
[1] "stdout and stderr is serialised through a corresponding internal pipe, in order to prevent annoying concurrent output behaviour."
Do tell me if I'm missing something.这篇关于并行运行可指定数量的命令-对比xargs -P,GNU并行和"moreutils"平行线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- 方便地,省略