包含Java中所有特定字母的正则表达式 [英] Regular Expression That Contains All Of The Specific Letters In Java

查看:67
本文介绍了包含Java中所有特定字母的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正则表达式,可以选择所有包含特定字母的全部(不是!任何)的单词,在Notepad ++上也可以正常工作.

I have a regular expression, which selects all the words that contains all (not! any) of the specific letters, just works fine on Notepad++.

正则表达式模式;

^(?=.*B)(?=.*T)(?=.*L).+$

输入文本文件;

AL
BAL
BAK
LABAT
TAL
LAT
BALAT
LA
AB
LATAB
TAB

在记事本++中输出正则表达式;

And output of the regular expression in notepad++;

LABAT
BALAT
LATAB

由于对Notepad ++很有用,因此我在Java上尝试了相同的正则表达式,但失败了.

As It is useful for Notepad++, I tried the same regular expression on java but it is simply failed.

这是我的测试代码;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import com.lev.kelimelik.resource.*;

public class Test {

    public static void main(String[] args) {
        String patternString = "^(?=.*B)(?=.*T)(?=.*L).+$";

        String dictionary = 
                "AL" + "\n"
                +"BAL" + "\n"
                +"BAK" + "\n"
                +"LABAT" + "\n"
                +"TAL" + "\n"
                +"LAT" + "\n"
                +"BALAT" + "\n"
                +"LA" + "\n"
                +"AB" + "\n"
                +"LATAB" + "\n"
                +"TAB" + "\n";

        Pattern p = Pattern.compile(patternString, Pattern.DOTALL);
        Matcher m = p.matcher(dictionary);
        while(m.find())
        {
            System.out.println("Match: " + m.group());
        }
    }

}

输出为错误,如下所示;

Match: AL
BAL
BAK
LABAT
TAL
LAT
BALAT
LA
AB
LATAB
TAB

我的问题很简单,这个正则表达式的java兼容版本是什么?

My question is simply, what is the java-compatible version of this regular expression?

推荐答案

特定于Java的答案

在现实生活中,我们很少需要验证,而我看到实际上,您只是将输入用作测试数据的数组.最常见的情况是逐行读取输入并对其进行检查.我同意在Notepad ++中会有些不同,但是在Java中,应单独检查一行.

Java-specific answer

In real life, we rarely need to validate lines, and I see that in fact, you just use the input as an array of test data. The most common scenario is reading input line by line and perform checks on it. I agree in Notepad++ it would be a bit different solution, but in Java, a single line should be checked separately.

也就是说,您不应该在不同的平台上复制相同的方法.在Notepad ++中有什么好处,在Java中不一定是有好处的.

That said, you should not copy the same approaches on different platforms. What is good in Notepad++ does not have to be good in Java.

我建议使用这种几乎不使用正则表达式的方法( String#split()仍在使用):

I suggest this almost regex-free approach (String#split() still uses it):

String dictionary_str = 
        "AL" + "\n"
        +"BAL" + "\n"
        +"BAK" + "\n"
        +"LABAT" + "\n"
        +"TAL" + "\n"
        +"LAT" + "\n"
        +"BALAT" + "\n"
        +"LA" + "\n"
        +"AB" + "\n"
        +"LATAB" + "\n"
        +"TAB" + "\n";
String[] dictionary = dictionary_str.split("\n"); // Split into lines
for (int i=0; i<dictionary.length; i++)   // Iterate through lines
{
    if(dictionary[i].indexOf("B") > -1 && // There must be B
       dictionary[i].indexOf("T") > -1 && // There must be T
       dictionary[i].indexOf("L") > -1)   // There must be L
    {
        System.out.println("Match: " + dictionary[i]); // No need matching, print the whole line
    }
}

请参见 IDEONE演示

您永远不应依赖.* .这种构造总是导致回溯问题.在这种情况下,您可以使用 否定字符类 所有量词:

You should not rely on .* ever. This construct causes backtracking issues all the time. In this case, you can easily optimize it with a negated character class and possessive quantifiers:

^(?=[^B]*+B)(?=[^T]*+T)(?=[^L]*+L)

正则表达式细目:

  • ^ -字符串的开头
  • (?= [^ B] * + B)-在字符串的开头,检查是否至少有一个 B 存在,且可能以0开头或更多 B
  • 以外的字符
  • (?= [^ T] * + T)- still 就在字符串开头,至少检查一个 T 可能以0个或多个除 T
  • 以外的其他字符开头的状态
  • (?= [^ L] * + L)-仍在字符串开头的 still ,请检查至少一个 L 可能以0个或多个除 L
  • 以外的其他字符开头的状态
  • ^ - start of string
  • (?=[^B]*+B) - right at the start of the string, check for at least one B presence that may be preceded with 0 or more characters other than B
  • (?=[^T]*+T) - still right at the start of the string, check for at least one T presence that may be preceded with 0 or more characters other than T
  • (?=[^L]*+L)- still right at the start of the string, check for at least one L presence that may be preceded with 0 or more characters other than L

请参见 Java演示:

String patternString = "^(?=[^B]*+B)(?=[^T]*+T)(?=[^L]*+L)";
String[] dictionary = {"AL", "BAL", "BAK", "LABAT", "TAL", "LAT", "BALAT", "LA", "AB", "LATAB", "TAB"};
for (int i=0; i<dictionary.length; i++)
{
    Pattern p = Pattern.compile(patternString);
    Matcher m = p.matcher(dictionary[i]);
    if(m.find())
    {
        System.out.println("Match: " + dictionary[i]);
    }
}

输出:

Match: LABAT
Match: BALAT
Match: LATAB

这篇关于包含Java中所有特定字母的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆