匹配令牌序列 [英] Matching token sequences

查看:70
本文介绍了匹配令牌序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组n个令牌(例如a,b,c)分布在其他令牌中.我想知道我集合中的所有成员是否都出现在给定的位置数(窗口大小)内.在我看来,可能可以编写RegEx来捕获此状态,但是确切的语法使我难以理解.

I have a set of n tokens (e.g., a, b, c) distributed among a bunch of other tokens. I would like to know if all members of my set occur within a given number of positions (window size). It occurred to me that it may be possible to write a RegEx to capture this state, but the exact syntax eludes me.


          11111
012345678901234
ab ab bc  a cba

在此示例中,给定窗口大小= 5,我想在位置12-14处匹配cba,在位置3-7处匹配abc.

In this example, given window size=5, I would like to match cba at positions 12-14, and abc in positions 3-7.

是否可以使用RegEx做到这一点,或者可以使用其他语法来捕获这种逻辑?

Is there a way to do this with RegEx, or is there some other kind of grammar that I can use to capture this logic?

我希望用Java来实现.

I am hoping to implement this in Java.

推荐答案

这是一个正则表达式,可匹配包含所有'a','b'和'c'的5个字母的序列:

Here's a regex that matches 5-letter sequences that include all of 'a', 'b' and 'c':

(?=.{0,4}a)(?=.{0,4}b)(?=.{0,4}c).{5}

因此,虽然基本上匹配任何5个字符(使用.{5}),但必须遵守三个先决条件.它们每个都需要显示一个令牌/字母(最多4个字符,后跟"a"等). (?=X)匹配"X,且正向宽度为零",零宽度表示匹配时字符位置不会移动.

So, while basically matching any 5 characters (with .{5}), there are three preconditions the matches have to observe. Each of them requires one of the tokens/letters to be present (up to 4 characters followed by 'a', etc.). (?=X) matches "X, with a zero-width positive look-ahead", where zero-width means that the character position is not moved while matching.

使用正则表达式执行此操作的速度很慢..这是一个更直接的版本(似乎比使用正则表达式快15倍):

Doing this with regexes is slow, though.. Here's a more direct version (seems about 15x faster than using regular expressions):

public static void find(String haystack, String tokens, int windowLen) {
    char[] tokenChars = tokens.toCharArray();
    int hayLen = haystack.length();

    int pos = 0;
    nextPos:
    while (pos + windowLen <= hayLen) {
        for (char c : tokenChars) {
            int i = haystack.indexOf(c, pos);
            if (i < 0) return;

            if (i - pos >= windowLen) {
                pos = i - windowLen + 1;
                continue nextPos;
            }
        }

        // match found at pos
        System.out.println(pos + ".." + (pos + windowLen - 1) + ": " + haystack.substring(pos, pos + windowLen));
        pos++;
    }
}

这篇关于匹配令牌序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆