Java Streams:如何进行高效的“区分和排序"? [英] Java Streams: How to do an efficient "distinct and sort"?

查看:19
本文介绍了Java Streams:如何进行高效的“区分和排序"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个 Stream 并且只想获取不同的元素并进行排序.

Let's assume I've got a Stream<T> and want to get only distinct elements and sorted.

最简单的方法是执行以下操作:

The naïve approach would be to do just the following:

Stream.of(...)
    .sorted()
    .distinct()

或者,也许反过来:

Stream.of(...)
    .distinct()
    .sorted()

由于 JDK 的源代码无法真正访问它们的实现,我只是想知道可能的内存消耗和性能影响.

Since the implementation of both of them is not really accessible by the JDK's source code I was just wondering about possible memory consumption and performance implications.

或者像下面这样编写自己的过滤器会更有效吗?

Or would it be even more efficient to write my own filter as the following?

Stream.of(...)
    .sorted()
    .filter(noAdjacentDuplicatesFilter())

public static Predicate<Object> noAdjacentDuplicatesFilter() {
    final Object[] previousValue = {new Object()};

    return value -> {
        final boolean takeValue = !Objects.equals(previousValue[0], value);
        previousValue[0] = value;
        return takeValue;
    };
}

推荐答案

当你在 sorted() 之后链接一个 distinct() 操作时,实现将利用已排序的数据的本质,避免构建内部HashSet,可以用下面的程序演示

When you chain a distinct() operation after sorted(), the implementation will utilize the sorted nature of the data and avoid building an internal HashSet, which can be demonstrated by the following program

public class DistinctAndSort {
    static int COMPARE, EQUALS, HASHCODE;
    static class Tracker implements Comparable<Tracker> {
        static int SERIAL;
        int id;
        Tracker() {
            id=SERIAL++/2;
        }
        public int compareTo(Tracker o) {
            COMPARE++;
            return Integer.compare(id, o.id);
        }
        public int hashCode() {
            HASHCODE++;
            return id;
        }
        public boolean equals(Object obj) {
            EQUALS++;
            return super.equals(obj);
        }
    }
    public static void main(String[] args) {
        System.out.println("adjacent sorted() and distinct()");
        Stream.generate(Tracker::new).limit(100)
              .sorted().distinct()
              .forEachOrdered(o -> {});
        System.out.printf("compareTo: %d, EQUALS: %d, HASHCODE: %d%n",
                          COMPARE, EQUALS, HASHCODE);
        COMPARE=EQUALS=HASHCODE=0;
        System.out.println("now with intermediate operation");
        Stream.generate(Tracker::new).limit(100)
            .sorted().map(x -> x).distinct()
            .forEachOrdered(o -> {});
        System.out.printf("compareTo: %d, EQUALS: %d, HASHCODE: %d%n",
                          COMPARE, EQUALS, HASHCODE);
    }
}

将打印

adjacent sorted() and distinct()
compareTo: 99, EQUALS: 99, HASHCODE: 0
now with intermediate operation
compareTo: 99, EQUALS: 100, HASHCODE: 200

中间操作,像map(x -> x)一样简单,不能被Stream实现识别,因此,它必须假设元素可能不会根据映射函数的结果进行排序.

The intermediate operation, as simple as map(x -> x), can’t be recognized by the Stream implementation, hence, it must assume that the elements might not be sorted in respect to the mapping function’s result.

不能保证这种优化会发生,但是,可以合理地假设 Stream 实现的开发人员不会删除该优化,甚至会尝试添加更多优化,因此滚动您自己的实现将阻止您的代码受益于未来的优化.

There is no guaranty that this kind of optimization happens, however, it is reasonable to assume that the developers of the Stream implementation will not remove that optimization and even try to add more optimizations, so rolling your own implementation will prevent your code from benefiting from future optimizations.

此外,您创建的是有状态的谓词",强烈不鼓励这样做,当然,与并行流一起使用时会中断.

Further, what you have created is a "stateful predicate", which is strongly discouraged, and, of course, will break when being used with a parallel stream.

如果您不相信 Stream API 能够足够有效地执行此操作,那么最好在没有 Stream API 的情况下实现此特定操作.

If you don’t trust the Stream API to perform this operation efficiently enough, you might be better off implementing this particular operation without the Stream API.

这篇关于Java Streams:如何进行高效的“区分和排序"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆