Perl线程和不安全信号 [英] Perl Threads and Unsafe Signals

查看:105
本文介绍了Perl线程和不安全信号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我最近想对我的Perl程序之一进行线程化以提​​高其速度.进入一个网站列表,我想为每个URL启动一个线程并获取每个网站的内容,然后在页面上查找公司描述.一旦一个线程找到了结果,或者所有线程都没有找到结果,我想退出,写下结果,并读入下一家公司的网址.

So I recently wanted to thread one of my Perl programs to increase its speed. Taking in a list of websites, I wanted to start a thread for each url and get the content of each website and then look for a company description on the page. Once one thread found a result, or all thread's didn't, I wanted to exit, write my result, and read in urls for my next company.

我看到的问题是,我在创建线程时调用的函数内部使用了Perl :: Unsafe :: Signals模块.我需要不安全的信号来中断被卡住"的正则表达式.但这似乎会引起各种问题,主要是程序崩溃和显示错误消息msg"Alarm Clock".

The problem that I see is that I use the Perl::Unsafe::Signals module inside of the function that I call when creating a thread. I need the unsafe signals to interrupt regular expressions that get "stuck". However this seems to cause all sorts of problems, mainly having the program crash and the error msg "Alarm Clock" shown.

因此,有没有一种方法可以安全地使用Perl :: Unsafe :: Signals和线程?有没有一种方法可以通过向函数发送信号来使正则表达式超时(就像我在下面发送"KILL"信号一样?)谢谢.

Therefore, is there a way to use Perl::Unsafe::Signals and threads safely? Is there a way to timeout a regular expression in another way by sending a signal to the function ( like I send a 'KILL' signal below?) Thanks.

注意:我将代码分解为所有相关部分,如果需要更多内容,请告诉我.

Note: I stripped down the code to all pertinent parts, let me know if you need more.

use threads ('exit' => 'threads_only');
use threads::shared;
my @descrip;
share(@descrip);

my $lock;
share($lock);

URL:foreach my $url(@unique_urls) {
        #skip blank urls
        if(!$url) { next URL; }#if

        #find description
        my $thread = threads->create(\&findCompanyDescription, $PREV_COMPANY, $PREV_BASE_URL, $url);

#while a description has not been found and there are still active threads, keep looking
#there may be a better way to do this, but this seems to work for me
while(!@descrip && threads->list() != 0) {;}

#kill all threads, write output, read in next batch of urls
my @threads = threads->list();
foreach(@threads) { print("detaching\n"); $_->kill('KILL')->detach(); }#foreach

#######按线程创建子例程

#######SUBROUTINE CALLED BY THREAD CREATE

sub findCompanyDescription {
    my($company_full, $base_url, $url) = @_;
    my($descrip, $raw_meta, $raw) = '';
    my @company;

    $SIG{'KILL'} = sub { alarm(0); threads->exit(); };

    eval {
        local $SIG{ALRM} = sub { die("alarm\n") }; # NB: \n required
        alarm(5);

        use Perl::Unsafe::Signals;
        UNSAFE_SIGNALS {

            while($company) {
            my @matches = ($content =~ m!.*<([\w\d]+).*?>\s*about\s+$company[\w\s\-_]*<.*?>(?:<.*?>|\s)*(.*?)</\1.*?>!sig);

            MATCH:for(my $ndx=1; $ndx<@matches; $ndx+=2) {
            ($raw, $descrip) = &filterResult($matches[$ndx], $company_full);

            if($descrip) {
                $company = undef;
                last(MATCH);
            }#if
        }#for

        #reduce the company name and try again
        $company = &reduceCompanyName($company);

        }#while

        alarm(0);
         };#unsafe_signals
    };#eval 

    if($@) {
        if($@ eq "alarm\n" && $DEBUG) { print("\nWebpage Timeout [].\n"); }#if
    }#if

    if($descrip) { lock($lock); {
       @descrip = ($PREV_ID, $company_full, $base_url, $url, 1, $raw, $descrip); } 
    }#if

推荐答案

通常,不安全"信号对于单线程和多线程都是不安全的.您仅通过使用线程不安全信号增加了危险. Perl通常的安全信号处理程序会设置标志signal_pending,而不会有意义地中断执行. VM在操作码之间检查该标志.

In general, "unsafe" signals are unsafe for both single threaded and multi-threaded. You've only increased your peril by using threads and unsafe signals. Perl's usual safe signal handler sets the flag signal_pending without meaningfull interrupting execution. The VM checks that flag when it's between opcodes.

您的正则表达式执行是单个原子"操作码.当然,regexp本身是另一个具有自己的操作码的VM,但是我们目前还没有Perl信号处理程序的可见性.

Your regexp execution is a single, "atomic" opcode. Of course, the regexp itself is another VM with its own opcodes but we don't have currently visibility into that for the perl signal handler.

坦率地说,我对如何中断正则表达式引擎没有什么好主意.它具有某种全局C状态,该状态在perl-5.10之前曾被阻止进入.像您正在尝试的那样,对于通用可中断性可能并不安全.如果您确实希望它完全可中断,则可能需要分叉并让您的子进程执行regexp并将结果通过管道传回.

Frankly, I've no good idea about how to interrupt the regexp engine. It's got some global C state which in the past prior to perl-5.10 prevented it from being reentrant. It might not be safe for universal interruptability like you're trying. If you really wanted it to be fully interruptible, you might want to fork and have your child process do the regexp and communicate the results back over a pipe.

require JSON;
require IO::Select;

my $TIMEOUT_SECONDS = 2.5; # seconds

my ( $read, $write );
pipe $read, $write;

my @matches;
my $pid = fork;
if ( $pid ) {

    my $select = IO::Select->new( $read );
    if ( $select->can_read( $TIMEOUT_SECONDS ) ) {
        local $/;
        my $json = <$read>;
        if ( $json ) {
            my $matches_ref = JSON::from_json( $json );
            if ( $matches_ref ) {
                @matches = @$matches_ref;
            }
        }
    }
    waitpid $pid, 0;
}
else {
    my @r = $content =~ m!.*<([\w\d]+).*?>\s*about\s+$company[\w\s\-_]*<.*?>(?:<.*?>|\s)*(.*?)</\1.*?>!sig;
    my $json = JSON::to_json( \ @r );
    print { $write } $json;
    close $write;
    exit;
}

这篇关于Perl线程和不安全信号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆