在java regex中获取组名 [英] Get group names in java regex

查看:249
本文介绍了在java regex中获取组名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试同时接收模式&一个字符串并返回一个组名的地图 - >匹配的结果。

I'm trying to receive both a pattern & a string and return a map of group name -> matched result.

示例:

(?<user>.*)

我想要返回包含user作为键的映射以及它匹配的值。

I would like to return for a map containing "user" as a key and whatever it matches as its value.

问题是我似乎无法从Java regex api获取组名。我只能按名称或索引获取匹配的值。我没有组名列表,Pattern和Matcher似乎都没有公开这些信息。
我检查了它的来源,似乎信息就在那里 - 它只是没有向用户公开。

the problem is that I can't seem to get the group name from the Java regex api. I can only get the matched values by name or by index. I don't have the list of group names and neither Pattern nor Matcher seem to expose this information. I have checked its source and it seems as if the information is there - it's just not exposed to the user.

我尝试了两个Java的java.util。正则表达式和jregex。 (并且如果有人建议任何其他支持此功能且性能良好,支持率高且性能优异的库,请不要太在意。

I tried both Java's java.util.regex and jregex. (and don't really care if someone suggested any other library that is good, supported & high in terms performance that supports this feature).

推荐答案

Java中没有API来获取指定捕获组的名称。我认为这是一个缺失的功能。

There is no API in Java to obtain the names of the named capturing groups. I think this is a missing feature.

最简单的方法是从模式中挑选出候选命名的捕获组,然后尝试访问命名组比赛。换句话说,在插入与整个模式匹配的字符串之前,您不知道命名捕获组的确切名称。

The easy way out is to pick out candidate named capturing groups from the pattern, then try to access the named group from the match. In other words, you don't know the exact names of the named capturing groups, until you plug in a string that matches the whole pattern.

Pattern 捕获指定捕获组的名称是 \(\?<([a-zA-Z] [a-zA-Z0-9] ] *)> (基于 模式类文档)。

The Pattern to capture the names of the named capturing group is \(\?<([a-zA-Z][a-zA-Z0-9]*)> (derived based on Pattern class documentation).

(困难的方法是为正则表达式实现解析器并获取捕获组的名称。)

(The hard way is to implement a parser for regex and get the names of the capturing groups).

示例实现:

import java.util.Scanner;
import java.util.Set;
import java.util.TreeSet;
import java.util.Iterator;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.MatchResult;

class RegexTester {

    public static void main(String args[]) {
        Scanner scanner = new Scanner(System.in);

        String regex = scanner.nextLine();
        StringBuilder input = new StringBuilder();
        while (scanner.hasNextLine()) {
            input.append(scanner.nextLine()).append('\n');
        }

        Set<String> namedGroups = getNamedGroupCandidates(regex);

        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(input);
        int groupCount = m.groupCount();

        int matchCount = 0;

        if (m.find()) {
            // Remove invalid groups
            Iterator<String> i = namedGroups.iterator();
            while (i.hasNext()) {
                try {
                    m.group(i.next());
                } catch (IllegalArgumentException e) {
                    i.remove();
                }
            }

            matchCount += 1;
            System.out.println("Match " + matchCount + ":");
            System.out.println("=" + m.group() + "=");
            System.out.println();
            printMatches(m, namedGroups);

            while (m.find()) {
                matchCount += 1;
                System.out.println("Match " + matchCount + ":");
                System.out.println("=" + m.group() + "=");
                System.out.println();
                printMatches(m, namedGroups);
            }
        }
    }

    private static void printMatches(Matcher matcher, Set<String> namedGroups) {
        for (String name: namedGroups) {
            String matchedString = matcher.group(name);
            if (matchedString != null) {
                System.out.println(name + "=" + matchedString + "=");
            } else {
                System.out.println(name + "_");
            }
        }

        System.out.println();

        for (int i = 1; i < matcher.groupCount(); i++) {
            String matchedString = matcher.group(i);
            if (matchedString != null) {
                System.out.println(i + "=" + matchedString + "=");
            } else {
                System.out.println(i + "_");
            }
        }

        System.out.println();
    }

    private static Set<String> getNamedGroupCandidates(String regex) {
        Set<String> namedGroups = new TreeSet<String>();

        Matcher m = Pattern.compile("\\(\\?<([a-zA-Z][a-zA-Z0-9]*)>").matcher(regex);

            while (m.find()) {
                namedGroups.add(m.group(1));
            }

            return namedGroups;
        }
    }
}

但是,这个实现有一个警告。它目前不适用于 Pattern.COMMENTS 模式。

There is a caveat to this implementation, though. It currently doesn't work with regex in Pattern.COMMENTS mode.

这篇关于在java regex中获取组名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆