Java Streams:如何有效地进行“独特和排序”? [英] Java Streams: How to do an efficient "distinct and sort"?

查看:146
本文介绍了Java Streams:如何有效地进行“独特和排序”?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们假设我有一个 Stream< T> ,并希望只获取不同的元素并进行排序。

Let's assume I've got a Stream<T> and want to get only distinct elements and sorted.

天真的方法是做到以下几点:

The naïve approach would be to do just the following:

Stream.of(...)
    .sorted()
    .distinct()

或者,可能相反:

Stream.of(...)
    .distinct()
    .sorted()

由于JDK源代码无法实现这两者的实现,我只是想知道可能的内存消耗和性能影响。

Since the implementation of both of them is not really accessible by the JDK's source code I was just wondering about possible memory consumption and performance implications.

或者编写我自己的过滤器会更有效率吗?

Or would it be even more efficient to write my own filter as the following?

Stream.of(...)
    .sorted()
    .filter(noAdjacentDuplicatesFilter())

public static Predicate<Object> noAdjacentDuplicatesFilter() {
    final Object[] previousValue = {new Object()};

    return value -> {
        final boolean takeValue = !Objects.equals(previousValue[0], value);
        previousValue[0] = value;
        return takeValue;
    };
}


推荐答案

当您链接 distinct() sorted()之后的操作,实现将利用数据的排序特性并避免构建内部 HashSet ,可通过以下程序演示

When you chain a distinct() operation after sorted(), the implementation will utilize the sorted nature of the data and avoid building an internal HashSet, which can be demonstrated by the following program

public class DistinctAndSort {
    static int COMPARE, EQUALS, HASHCODE;
    static class Tracker implements Comparable<Tracker> {
        static int SERIAL;
        int id;
        Tracker() {
            id=SERIAL++/2;
        }
        public int compareTo(Tracker o) {
            COMPARE++;
            return Integer.compare(id, o.id);
        }
        public int hashCode() {
            HASHCODE++;
            return id;
        }
        public boolean equals(Object obj) {
            EQUALS++;
            return super.equals(obj);
        }
    }
    public static void main(String[] args) {
        System.out.println("adjacent sorted() and distinct()");
        Stream.generate(Tracker::new).limit(100)
              .sorted().distinct()
              .forEachOrdered(o -> {});
        System.out.printf("compareTo: %d, EQUALS: %d, HASHCODE: %d%n",
                          COMPARE, EQUALS, HASHCODE);
        COMPARE=EQUALS=HASHCODE=0;
        System.out.println("now with intermediate operation");
        Stream.generate(Tracker::new).limit(100)
            .sorted().map(x -> x).distinct()
            .forEachOrdered(o -> {});
        System.out.printf("compareTo: %d, EQUALS: %d, HASHCODE: %d%n",
                          COMPARE, EQUALS, HASHCODE);
    }
}

将打印

adjacent sorted() and distinct()
compareTo: 99, EQUALS: 99, HASHCODE: 0
now with intermediate operation
compareTo: 99, EQUALS: 100, HASHCODE: 200

中间体操作,就像 map(x - > x)一样简单, Stream 实现无法识别,因此,它必须假设元素可能不会根据映射函数的结果进行排序。

The intermediate operation, as simple as map(x -> x), can’t be recognized by the Stream implementation, hence, it must assume that the elements might not be sorted in respect to the mapping function’s result.

没有保证这种优化会发生,但是,它是合理地假设Stream实现的开发人员不会删除该优化,甚至尝试添加更多优化,因此滚动您自己的实现将阻止您的代码从未来的优化中受益。

There is no guaranty that this kind of optimization happens, however, it is reasonable to assume that the developers of the Stream implementation will not remove that optimization and even try to add more optimizations, so rolling your own implementation will prevent your code from benefiting from future optimizations.

此外,你创建的是一个有状态谓词,即s非常气馁,当然,当与并行流一起使用时会中断。

Further, what you have created is a "stateful predicate", which is strongly discouraged, and, of course, will break when being used with a parallel stream.

如果您不信任Stream API来有效地执行此操作,那么如果没有Stream API,可能会更好地实现这个特定的操作。

If you don’t trust the Stream API to perform this operation efficiently enough, you might be better off implementing this particular operation without the Stream API.

这篇关于Java Streams:如何有效地进行“独特和排序”?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆