如何停止此代码中的分叉 [英] How to stop forking in this code

查看:68
本文介绍了如何停止此代码中的分叉的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个perl脚本,在那里可以获取流的片段(我不知道前面有多少片段)

但是我想不出一种知道何时停止wgeting的好方法.现在,如果wget返回失败,那么我们将创建一个名为"end"的文件,一旦主程序看到该文件,它将停止循环.有没有更好的方法可以做到这一点?

显然,如果依次执行而不是同时执行,将很容易,但是我试图使其下载速度最快.

my $link = $ARGV[0];
my ($url) = $link=~ m/(.+-)\d+.ts/i;

my $num = 0;

#while the file END doesn't exist
my @pids;
while (! -e "END") {
        #create the URL, increment by 1
        my $video=$url.++$num.".ts";
        die "could not fork" unless defined (my $pid = fork());

        #child process goes until wget returns invalid, create END
        if (not $pid) {
                system ("wget -T 5 -t 5 $video");
                `touch END` if $? != 0;
                exit;
        }
        push @pids, $pid;
}

#parent process still running, waiting for the same END file.
for my $pid (@pids) { waitpid $pid,0; }

print "pids finished\n";

sleep 1;
`rm END`;

解决方案

您没有说明可能有多少个进程,但是没有资源是无限的,因此您应该限制数量,否则会看到性能迅速下降.当您达到饱和状态时.

当您在网络上外出时,这种情况就更是如此,因为您可能会烦恼服务器(并且很快,事情也将停止变得更快).也许一次最多运行几十个进程?

然后一个选择是使用 Parallel :: ForkManager .它具有

这将在执行$i == 3的一批作业之后停止分叉.添加打印件以进行诊断.

回调" run_on_finish仅在整个批处理完成后才运行.其中的 anonymous sub 始终接收6个参数,但只有第一个,子pid总是被定义.最后一个有可能由孩子传递的数据,当发生这种情况时,我们设置标志.子代可以通过将引用传递给finish方法来返回数据.仅表示条件,我们可以简单地传递任何内容.我以\$ret作为传递实际数据的示例.

有关更多信息,请参见文档,但这可以满足您的要求.有关 far 的更多信息,请参见 Forks :: Super .


如果您希望按自己的方式进行分叉,我首先会在其中放入一个sleep,这样您就不会用太多请求轰炸服务器.您的孩子可以使用 socketpair 与父母交谈.失败的孩子可以写,而其他所有孩子都可以简单地关闭其套接字.家长会不断检查,例如使用 IO :: Select 中的can_read进行检查. perlipc 中有一个示例.由于您只需要孩子写给父母,因此管道也足够. /p>

您也可以通过信号来做到这一点.失败的子代将SIGUSR1发送(例如)给父代,父代将捕获并设置一个全局变量来控制进一步的派生.这很简单,因为父级仅捕获一个信号,而不管它来自何处.请参见 perlipc IPC :: Run .


要在每个子出口退出时立即运行回调,请使用 reap_finished_children .参见这篇文章.

So I have a perl script that goes out there and wgets pieces of a stream (I don't know how many pieces there are upfront)

but I can't think of a good way to know when to STOP wget'ing. right now if wget returns unsuccessful, than we create a file called "end" and once the main program sees it, it stops the loop. Is there a better way to go about doing this?

Obviously it would be easy if it was done sequentially instead of concurrently, but i was trying to make it download the fastest.

my $link = $ARGV[0];
my ($url) = $link=~ m/(.+-)\d+.ts/i;

my $num = 0;

#while the file END doesn't exist
my @pids;
while (! -e "END") {
        #create the URL, increment by 1
        my $video=$url.++$num.".ts";
        die "could not fork" unless defined (my $pid = fork());

        #child process goes until wget returns invalid, create END
        if (not $pid) {
                system ("wget -T 5 -t 5 $video");
                `touch END` if $? != 0;
                exit;
        }
        push @pids, $pid;
}

#parent process still running, waiting for the same END file.
for my $pid (@pids) { waitpid $pid,0; }

print "pids finished\n";

sleep 1;
`rm END`;

解决方案

You don't indicate how many processes there may be, but no resource is unlimited and you should limit the number or you'll see a rapid degradation of performance as you reach saturation.

This is even more so when going out on the network since you may be annoying a server (and things will also stop being faster quite soon). Perhaps run up to a few tens of processes at a time?

Then one option is to limit a number of parallel downloads using Parallel::ForkManager. It has a way to return data to parent, so a child can report failure. Then its run_on_finish method can check each batch for such a flag (of failure), and set a variable that controls the forking.

use warnings;
use strict;
use Parallel::ForkManager;    

my $pm = Parallel::ForkManager->new(2);  # only 2 for a managable demo
my $stop_forking;

# The sub gets 6 parameters, but only first (pid) is always defined
# The last one is what a child process may have passed
$pm->run_on_finish(  
    sub { $stop_forking = 1 if defined $_[-1] } 
); 

for my $i (0..9)
{
    last if $stop_forking;

    $pm->start and next;    # forks
    my $ret = run_job($i);  # child process

    # Pass data to parent under a condition
    if ($ret eq 'FAIL') {  $pm->finish(0, \$ret) }  # child exits 
    else                {  $pm->finish }
}
$pm->wait_all_children;

sub run_job { 
    my ($i) = $_[0];
    sleep 2;
    print "Child: job $i exiting\n";
    return ($i == 3 ? 'FAIL' : 1);
}

This stops forking after the batch of jobs within which $i == 3. Add prints for diagnostics.

The "callback" run_on_finish runs only once a whole batch completes. The anonymous sub in it always receives 6 arguments, but only the first one, the child pid, is always defined. The last one has data possibly passed by the child, and when that happens we set the flag. A child can return data by passing a reference to finish method. To only indicate a condition we can simply pass anything. I use \$ret as an example of passing actual data.

See documentation for more, but this does what you ask. For yet far more see Forks::Super.


If you wish to fork as you do, I'd first put in a little sleep there, so you don't bombard the server with too many requests. Your children can talk with the parent using socketpair. The failed child can write while all others can simply close their socket. The parent keeps checking, for example with can_read from IO::Select. There is an example in perlipc. Since you only need children to write to the parent the pipe would suffice as well.

You can also do it with a signal. The child that fails sends (say) SIGUSR1 to the parent, which the parent traps and sets a global variable that controls further forks. This is simpler as the parent only traps that one signal and doesn't care where it comes from. See perlipc and sigtrap pragma.

You can also use a file, much like you do, which is probably simplest since here you don't care about racing issues (whether children writes overlap), but only about an empty file showing up.

However, in all these you'd also want to limit the number of parallel processes.

Finally, there are also modules that help with external commands, for example IPC::Run.


To run the callback right as each child exits use reap_finished_children. See this post.

这篇关于如何停止此代码中的分叉的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆