正则表达式无法在Java中正常工作,否则将无法正常工作 [英] Regex not working in Java while working otherwise

查看:67
本文介绍了正则表达式无法在Java中正常工作,否则将无法正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做了一个正则表达式:

我想念什么?应该如何转换正则表达式以使其在Java中工作?

解决方案

要获取每个组的内容,可以使用 Matcher#group(number) Matcher#group(name).对于您而言,要获取第一组的内容,请使用 m.group(1),您将获得 53022 .

m.group()的问题在于它与 m.group(0)相同,因此它返回组0的内容,该内容与整个模式匹配.

要遍历所有组,请使用简单的for循环.要动态获取模式中的组数量,请使用 Matcher#groupCount .

因此,要使用所有组的结果,都可以使用

 模式p = Pattern.compile("[\\ s] * javascript:[\\ s] * m \\((-?\\ d +)[\\ s] *,[\\ s] *(-?\\ d +)[\\ s] *,[\\ s] {0,} encodeURIComponent \\(\\'([^ \\'] +)* \\'\\)[\\ s] *,[\\ s] *(-?\\ d +)\\)[\\ s] *));Matcher m = p.matcher("javascript:m(53022,2,encodeURIComponent('Cr 12045'),85)");List< String>组=新的ArrayList<>();而(m.find()){对于(int i = 1; i< = m.groupCount(); i ++){groups.add(m.group(i));}}System.out.println(groups);//[53022,2,Cr 12045,85] 

顺便说一句

  • \ s 已经是字符类,因此不需要嵌套在 [..] 中,因此可以代替 [\\ s] *,您可以编写 \\ s * .
  • {0,} * 相同,所以我看不出有任何理由将两者混用,请在任何地方都使用 *
  • '不是正则表达式元字符,因此不需要转义

I made a regular expression: https://regex101.com/r/ToCwrE/2/

All it should do, is get out the function's parameters. I am trying with capture groups to achieve this.

[\s]*javascript:[\s]*m\((-?\d+)[\s]*,[\s]*(-?\d+)[\s]*,[\s]{0,}encodeURIComponent\(\'([^\']+)*\'\)[\s]*,[\s]*(-?\d+)\)[\s]*

Tried it on:

javascript:m(53009,2,encodeURIComponent('7711T'), 22)
javascript:m(52992,2,encodeURIComponent('3013'), 2)
javascript:m(10440,2,encodeURIComponent('F Series'), 11)
javascript:m(53022,2,encodeURIComponent('C 12045'), 85)
javascript:m(53045,2,encodeURIComponent('Prox 8441'), 16)
javascript:m(26016,2,encodeURIComponent('Vard   asd .ious'), 22)

Using the site regex101 and a few similar ones, it correctly returns the matched groups. However when I am trying to use it in Java, it simply won't match the capture groups and only returns the whole text.

If I copy paste it with IDEA, It automatically gets escaped (replaces \ to \):

Pattern pattern = Pattern.compile("[\\s]*javascript:[\\s]*m\\((-?\\d+)[\\s]*,[\\s]*(-?\\d+)[\\s]*,[\\s]{0,}encodeURIComponent\\(\\'([^\\']+)*\\'\\)[\\s]*,[\\s]*(-?\\d+)\\)[\\s]*");
Matcher m = pattern.matcher("javascript:m(53022,2,encodeURIComponent('Cr 12045'), 85)");
List<String> groups = new ArrayList<>();
while (m.find()) {
    groups.add(m.group());
}
groups;

What am I missing? How should the regex be converted to get it working in Java?

解决方案

To get content of each group you can use Matcher#group(number) or Matcher#group(name). In your case to get content of first group use m.group(1) and you will get 53022.

Problem with m.group() is that it is same as m.group(0) so it returns content of group 0, which holds match for whole pattern.

To iterate over all groups use simple for loop. To dynamically get amounts of groups in pattern use Matcher#groupCount.

So to put results from all groups you can use

Pattern p = Pattern.compile("[\\s]*javascript:[\\s]*m\\((-?\\d+)[\\s]*,[\\s]*(-?\\d+)[\\s]*,[\\s]{0,}encodeURIComponent\\(\\'([^\\']+)*\\'\\)[\\s]*,[\\s]*(-?\\d+)\\)[\\s]*");
Matcher m = p.matcher("javascript:m(53022,2,encodeURIComponent('Cr 12045'), 85)");
List<String> groups = new ArrayList<>();
while (m.find()) {
    for (int i=1; i<=m.groupCount(); i++){
        groups.add(m.group(i));
    }
}

System.out.println(groups); //[53022, 2, Cr 12045, 85]

BTW

  • \s is already character class so it doesn't need to be nested in [..], so instead of [\\s]* you can write \\s*.
  • {0,} is same as * so I don't see any reason to mix those two, use * everywhere
  • ' is not regex metacharacter so it doesn't need escaping

这篇关于正则表达式无法在Java中正常工作,否则将无法正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆