如何创建正则表达式匹配流? [英] How do I create a Stream of regex matches?

查看:141
本文介绍了如何创建正则表达式匹配流?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析标准输入并提取与特定模式匹配的每个字符串,计算每个匹配的出现次数,并按字母顺序打印结果。这个问题似乎是Streams API的一个很好的匹配,但是我找不到从Matcher创建匹配流的简洁方法。

I am trying to parse standard input and extract every string that matches with a specific pattern, count the number of occurrences of each match, and print the results alphabetically. This problem seems like a good match for the Streams API, but I can't find a concise way to create a stream of matches from a Matcher.

我解决了这个问题通过在匹配上实现迭代器并将其包装到Stream中的问题,但结果不是很易读。如何在不引入其他类的情况下创建正则表达式匹配流?

I worked around this problem by implementing an iterator over the matches and wrapping it into a Stream, but the result is not very readable. How can I create a stream of regex matches without introducing additional classes?

public class PatternCounter
{
    static private class MatcherIterator implements Iterator<String> {
        private final Matcher matcher;
        public MatcherIterator(Matcher matcher) {
            this.matcher = matcher;
        }
        public boolean hasNext() {
            return matcher.find();
        }
        public String next() {
            return matcher.group(0);
        }
    }

    static public void main(String[] args) throws Throwable {
        Pattern pattern = Pattern.compile("[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)");

        new TreeMap<String, Long>(new BufferedReader(new InputStreamReader(System.in))
            .lines().map(line -> {
                Matcher matcher = pattern.matcher(line);
                return StreamSupport.stream(
                        Spliterators.spliteratorUnknownSize(new MatcherIterator(matcher), Spliterator.ORDERED), false);
            }).reduce(Stream.empty(), Stream::concat).collect(groupingBy(o -> o, counting()))
        ).forEach((k, v) -> {
            System.out.printf("%s\t%s\n",k,v);
        });
    }
}


推荐答案

好,在Java 8中,有 Pattern.splitAsStream 这将提供由分隔符模式拆分的项目流,但遗憾的是没有支持方法获取匹配的流。

Well, in Java 8, there is Pattern.splitAsStream which will provide a stream of items split by a delimiter pattern but unfortunately no support method for getting a stream of matches.

如果您要实现这样的 Stream ,我建议直接实现 Spliterator ,而不是实现和包装 Iterator 。您可能对 Iterator 更熟悉,但实现一个简单的 Spliterator 是直截了当的:

If you are going to implement such a Stream, I recommend implementing Spliterator directly rather than implementing and wrapping an Iterator. You may be more familiar with Iterator but implementing a simple Spliterator is straight-forward:

final class MatchItr extends Spliterators.AbstractSpliterator<String> {
    private final Matcher matcher;
    MatchItr(Matcher m) {
        super(m.regionEnd()-m.regionStart(), ORDERED|NONNULL);
        matcher=m;
    }
    public boolean tryAdvance(Consumer<? super String> action) {
        if(!matcher.find()) return false;
        action.accept(matcher.group());
        return true;
    }
}

您可以考虑覆盖 forEachRemaining 但是有一个直接循环。

You may consider overriding forEachRemaining with a straight-forward loop, though.

如果我理解你的尝试正确,解决方案看起来应该更像:

If I understand your attempt correctly, the solution should look more like:

Pattern pattern = Pattern.compile(
                 "[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)");

try(BufferedReader br=new BufferedReader(System.console().reader())) {

    br.lines()
      .flatMap(line -> StreamSupport.stream(new MatchItr(pattern.matcher(line)), false))
      .collect(Collectors.groupingBy(o->o, TreeMap::new, Collectors.counting()))
      .forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}






Java 9提供了一种方法 Stream< ; MatchResult的>结果() 直接在匹配器上。但是为了在流中找到匹配,有 扫描仪 上更方便的方法。有了它,实现简化为


Java 9 provides a method Stream<MatchResult> results() directly on the Matcher. But for finding matches within a stream, there’s an even more convenient method on Scanner. With that, the implementation simplifies to

try(Scanner s = new Scanner(System.console().reader())) {
    s.findAll(pattern)
     .collect(Collectors.groupingBy(MatchResult::group,TreeMap::new,Collectors.counting()))
     .forEach((k, v) -> System.out.printf("%s\t%s\n",k,v));
}

此答案包含可与Java 8一起使用的 Scanner.findAll 的后端。

This answer contains a back-port of Scanner.findAll that can be used with Java 8.

这篇关于如何创建正则表达式匹配流?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆