使用GNU Parallel运行循环bash curl脚本 [英] Running a loop bash curl script with GNU Parallel

查看:108
本文介绍了使用GNU Parallel运行循环bash curl脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近刚开始使用bash编程,遇到了GNU Parallel,这正是我为项目所需要的。
拥有一个基本的循环脚本,该脚本可以循环遍历ip和ping列表。 ip的列表在其他脚本的驱动下不断更新为新的ip。

Just recently started programming in bash and came across GNU Parallel, which is exactly, what I need for my project. Have a basic loop script, which is meant to loop through the list of ip's and ping each, one time. The list with the ip's is constantly updated with the new ones, driven by the other script.

对于多线程,我想使用GNU Parallel。

For multithreading, I would like to use the GNU Parallel.

我的想法是运行10个并行实例,每个实例将从列表中捕获一个ip,将其插入curl命令中并将其从列表中删除,因此其他实例不会

My idea was to run 10 Parallel instances, each will capture one ip from the list, insert it into the curl command and removes it from the list, so the other instances of won't pick it up.

#! /bin/bash
while true; do

while read -r ip; do
curl $ip >> result.txt
sed -i '1,1 d' iplist
done < ipslist
done

我不确定,运行bash脚本的正确方法是什么,在这种情况下,我能找到的每个解决方案都无法正常工作,事情变得一团糟。
我有一种感觉,所有这些都可以用一行完成,但是出于我自己的原因,我宁愿将其作为bash脚本运行。
感谢您的帮助!

I'm not sure, what's the right way to run the bash script, in this case, every solution I could find, doesn't work properly and things get totally messy. I have a feeling, this all can be done with a single line, but, for my own reasons, I'd prefer to run it as bash script. Would be grateful for any help!

推荐答案

托马斯的解决方案在这种情况下看起来是正确的。但是,如果您需要做的不只是 curl ,那么我建议您创建一个函数:

Thomas' solution looks like the correct for this particular situation. If, however, you need to do more than simply curl then I will recommend making a function:

#! /bin/bash

doit() {
  ip="$1"
  curl "$ip"
  echo do other stuff here
}
export -f doit

while true; do
  parallel -j10 doit < ipslist >> result.txt
done

如果要 code>成为一个队列,因此您以后可以将内容添加到队列中,并且只希望它 curl 一次:

tail -n+0 -f ipslist | parallel doit >> result.txt

现在您可以稍后将内容添加到ipslist中,GNU Parallel将 curl

Now you can later simply add stuff to ipslist and GNU Parallel will curl that, too.

(使用GNU并行作为队列系统/批处理
管理器时,有一个小问题:您必须先提交JobSlot作业的数量,然后才能开始
,之后您可以一次提交一个,如果有可用的插槽,作业将立即开始
。运行中的输出或
完成的作业将被保留,并且仅在JobSlots启动更多
作业时才会打印(除非您使用--ungroup或--line-buffer,否则
会打印作业的输出例如,如果
有10个工作位,则第一个已完成的作业的输出将仅在作业11开始时打印
,而第二个
的输出将仅打印当作业12开始时。)

(There is a a small issue when using GNU parallel as queue system/batch manager: You have to submit JobSlot number of jobs before they will start, and after that you can submit one at a time, and job will start immediately if free slots are available. Output from the running or completed jobs are held back and will only be printed when JobSlots more jobs has been started (unless you use --ungroup or --line-buffer, in which case the output from the jobs are printed immediately). E.g. if you have 10 jobslots then the output from the first completed job will only be printed when job 11 has started, and the output of second completed job will only be printed when job 12 has started.)

这篇关于使用GNU Parallel运行循环bash curl脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆