Bash:循环遍历文件并读取子字符串作为参数,执行多个实例 [英] Bash: Loop through file and read substring as argument, execute multiple instances

查看:169
本文介绍了Bash:循环遍历文件并读取子字符串作为参数,执行多个实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在如何

我目前有一个在Windows下运行的脚本,该脚本经常从服务器列表中调用递归文件树.

I currently have a script running under windows that frequently invokes recursive file trees from a list of servers.

我使用AutoIt(作业管理器)脚本执行30个lftp(静止窗口)的并行实例,这样做:

I use an AutoIt (job manager) script to execute 30 parallel instances of lftp (still windows), doing this:

lftp -e "find .; exit" <serveraddr>

用作作业管理器输入的文件是纯文本文件,每一行的格式如下:

The file used as input for the job manager is a plain text file and each line is formatted like this:

<serveraddr>|...

其中"..."是不重要的数据.我需要运行lftp的多个实例以实现最佳性能,因为单个实例的性能取决于服务器的响应时间.

where "..." is unimportant data. I need to run multiple instances of lftp in order to achieve maximum performance, because single instance performance is determined by the response time of the server.

每个lftp.exe实例将其输出传递到名为

Each lftp.exe instance pipes its output to a file named

<serveraddr>.txt

需要如何

现在,我需要将这整个过程移植到Linux(安装了lftp的Ubuntu)专用服务器上.从我以前非常有限的Linux使用经验来看,我想这将非常简单.

Now I need to port this whole thing over to a linux (Ubuntu, with lftp installed) dedicated server. From my previous, very(!) limited experience with linux, I guess this will be quite simple.

我需要写什么,用什么写?例如,我是否仍需要工作手册或可以在一个脚本中完成?如何从文件中读取(我想这将是最简单的部分),以及如何保持最大值.是否有30个实例在运行(可能甚至超时,因为响应极慢的服务器可能会阻塞队列)?

What do I need to write and with what? For example, do I still need a job man script or can this be done in a single script? How do I read from the file (I guess this will be the easy part), and how do I keep a max. amount of 30 instances running (maybe even with a timeout, because extremely unresponsive servers can clog the queue)?

谢谢!

推荐答案

并行处理

我会使用GNU/parallel.它不是默认分发的,但是可以从默认软件包存储库安装到大多数Linux分发中.它是这样的:

Parallel processing

I'd use GNU/parallel. It isn't distributed by default, but can be installed for most Linux distributions from default package repositories. It works like this:

parallel echo ::: arg1 arg2

将并行执行echo arg1echo arg2.

因此,最简单的方法是创建一个脚本,以使您的服务器在bash/perl/python中同步(无论您喜欢什么)-并按以下方式执行:

So the most easy approach is to create a script that synchronizes your server in bash/perl/python - whatever suits your fancy - and execute it like this:

parallel ./script ::: server1 server2

脚本可能看起来像这样:

The script could look like this:

#!/bin/sh
#$0 holds program name, $1 holds first argument.
#$1 will get passed from GNU/parallel. we save it to a variable.
server="$1"
lftp -e "find .; exit" "$server" >"$server-files.txt"

lftp似乎也可用于Linux,因此您无需更改FTP客户端.

lftp seems to be available for Linux as well, so you don't need to change the FTP client.

要最大运行一次30个实例,像这样传递-j30:parallel -j30 echo ::: 1 2 3

To run max. 30 instances at a time, pass a -j30 like this: parallel -j30 echo ::: 1 2 3

现在如何将包含<server>|...条目的规范文件转换为GNU/parallel参数?简单-首先,过滤文件以仅包含主机名:

Now how do you transform specification file containing <server>|... entries to GNU/parallel arguments? Easy - first, filter the file to contain just host names:

sed 's/|.*$//' server-list.txt

sed用于使用正则表达式等替换事物.这将剥离第一个|之后直到行尾($)的所有内容(.*). (虽然|通常是指正则表达式中的替代运算符,但在sed中,它需要转义才能正常工作,否则意味着仅是|.)

sed is used to replace things using regular expressions, and more. This will strip everything (.*) after the first | up to the line end ($). (While | normally means alternative operator in regular expressions, in sed, it needs to be escaped to work like that, otherwise it means just plain |.)

所以现在您有了服务器列表.如何将它们传递给您的脚本?用xargsxargs将把每一行都当作可执行文件的附加参数.例如

So now you have list of servers. How to pass them to your script? With xargs! xargs will put each line as if it was an additional argument to your executable. For example

echo -e "1\n2"|xargs echo fixed_argument

将运行

echo fixed_argument 1 2

因此,您应该这样做

sed 's/|.*$//' server-list.txt | xargs parallel -j30 ./script :::

注意事项

请确保不要在每个并行任务中将结果保存到同一文件中,否则文件将损坏-coreutils很简单,除非您自己实现,否则不要实现任何锁定机制.这就是为什么我将输出重定向到$server-files.txt而不是files.txt的原因.

Caveats

Be sure not to save the results to the same file in each parallel task, otherwise the file will get corrupt - coreutils are simple and don't implement any locking mechanisms unless you implement them yourself. That's why I redirected the output to $server-files.txt rather than files.txt.

这篇关于Bash:循环遍历文件并读取子字符串作为参数,执行多个实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆