取消长期运行的正则表达式匹配? [英] Cancelling a long running regex match?

查看:123
本文介绍了取消长期运行的正则表达式匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我正在运行一项服务,用户可以提交正则表达式来搜索大量数据。如果用户提交非常慢的正则表达式(即,需要几分钟才能返回Matcher.find()),我想要一种方法来取消该匹配。我能想到这样做的唯一方法是让另一个线程监视匹配的持续时间,并在必要时使用Thread.stop()取消它。

Say I'm running a service where users can submit a regex to search through lots of data. If the user submits a regex that is very slow (ie. takes minutes for Matcher.find() to return), I want a way to cancel that match. The only way I can think of doing this is to have another thread monitor how long a match is taking and use Thread.stop() to cancel it if necessary.

成员变量:

long REGEX_TIMEOUT = 30000L;
Object lock = new Object();
boolean finished = false;
Thread matcherThread;

匹配线程:

try {
    matcherThread = Thread.currentThread();

    // imagine code to start monitor thread is here

    try {
        matched = matcher.find();
    } finally {
        synchronized (lock) {
            finished = true;
            lock.notifyAll();
        }
    }
} catch (ThreadDeath td) {
    // send angry message to client
    // handle error without rethrowing td
}

监控线程:

synchronized (lock) {
    while (! finished) {
        try {
            lock.wait(REGEX_TIMEOUT);

            if (! finished) {
                matcherThread.stop();
            }
        } catch (InterruptedException ex) {
            // ignore, top level method in dedicated thread, etc..
        }
    }
}

我读过java.sun.com/j2se/1.4.2/docs/guide/misc/ threadPrimitiveDeprecation.html,我认为这种用法是安全的,因为我控制ThreadDeath通过同步抛出的位置并处理它,唯一受损的对象可能是我的Pattern和Matcher实例,无论如何都会被丢弃。我认为这会破坏Thread.stop(),因为我不会重新抛出错误,但我不想让线程死掉,只是中止find()方法。

I've read java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html and I think this usage is safe since I'm controlling where ThreadDeath is thrown via synchronisation and handle it and the only damaged objects could be my Pattern and Matcher instances which will be discarded anyway. I think this breaks Thread.stop() because I'm not rethrowing the error, but I don't really want the thread to die, just abort the find() method.

到目前为止,我已设法避免使用这些已弃用的API组件,但Matcher.find()似乎不可中断,并且可能需要很长时间才能返回。有没有更好的方法呢?

I've managed to avoid using these deprecated API components so far, but Matcher.find() does not seem to be interruptible and can take a very long time to return. Is there any better way to do this?

推荐答案

来自Heritrix:( crawler.archive.org

From Heritrix: (crawler.archive.org)

/**
 * CharSequence that noticed thread interrupts -- as might be necessary 
 * to recover from a loose regex on unexpected challenging input. 
 * 
 * @author gojomo
 */
public class InterruptibleCharSequence implements CharSequence {
    CharSequence inner;
    // public long counter = 0; 

    public InterruptibleCharSequence(CharSequence inner) {
        super();
        this.inner = inner;
    }

    public char charAt(int index) {
        if (Thread.interrupted()) { // clears flag if set
            throw new RuntimeException(new InterruptedException());
        }
        // counter++;
        return inner.charAt(index);
    }

    public int length() {
        return inner.length();
    }

    public CharSequence subSequence(int start, int end) {
        return new InterruptibleCharSequence(inner.subSequence(start, end));
    }

    @Override
    public String toString() {
        return inner.toString();
    }
}

用这个包裹你的CharSequence并且线程中断将起作用......

Wrap your CharSequence with this one and Thread interrupts will work ...

这篇关于取消长期运行的正则表达式匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆