分组后对列表进行排序 [英] sorting Lists after groupingBy

查看:172
本文介绍了分组后对列表进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道,流(或收集器)中是否已经有一个已实现的功能,该功能已将列表作为值进行了排序.例如.以下代码均会产生按年龄分组的按性别分组的人员清单.第一个解决方案具有一些开销排序(看起来有些sc琐).第二种解决方案需要对每个人都进行两次检查,但是要很好地完成这项工作.

I am wondering, if there is already an implemented feature in streams (or Collectors) which has sorted Lists as values. E.g. the following codes both produce gender-grouped Lists of persons, sorted by age. The first solution has some overhead sorting (and looks a little bit scruffy). The second solution needs to look at every person twice but does the job in a pretty way.

先排序,然后在一个流中分组:

First sorting then grouping in one stream:

Map<Gender, List<Person>> sortedListsByGender = (List<Person>) roster
        .stream()
        .sorted(Person::compareByAge)
        .collect(Collectors.groupingBy(Person::getGender));

首先分组,然后对每个值进行排序:

First grouping, then sorting every value:

Map<Gender, List<Person>> sortedListsByGender = (List<Person>) roster
        .stream()
        .collect(Collectors.groupingBy(Person::getGender));
sortedListsByGender.values()
        .forEach(list -> Collections.sort(list, Person::compareByAge));

我只是想知道是否已经实现了某些功能,并且可以像groupingBySorted这样运行一次.

I am just wondering, if there is already something implemented, which does this in one run, like groupingBySorted.

推荐答案

collect操作之前在流上使用sorted(comparator)时,流必须缓冲整个流内容才能对其进行排序和排序与之后对组中较小的列表进行排序相比,可能需要在该缓冲区内进行更多的数据移动.因此,即使启用了并行处理,该实现也将利用多个内核,因此性能不如对单个组进行排序.

When using sorted(comparator) on the stream before the collect operation, the stream has to buffer the entire stream contents to be able to sort it and the sorting may involve much more data movement within that buffer, compared to sorting the smaller lists of the groups afterwards. So the performance is not as good as sorting the individual groups though the implementation will utilize multiple cores if parallel processing has been enabled.

但是请注意,使用sortedListsByGender.values().forEach(…)不能并行执行操作,即使使用sortedListsByGender.values().parallelStream().forEach(…)也只能允许对组进行并行处理,而每个排序操作仍然是顺序的.

But note that using sortedListsByGender.values().forEach(…) is not a parallelizable operation and even using sortedListsByGender.values().parallelStream().forEach(…) would only allow parallel processing of groups while each sort operation still is sequential.

像在收集器中那样执行排序操作时

When performing the sort operation within a collector as in

static <T> Collector<T,?,List<T>> toSortedList(Comparator<? super T> c) {
    return Collectors.collectingAndThen(
        Collectors.toCollection(ArrayList::new), l->{ l.sort(c); return l; } );
}

Map<Gender, List<Person>> sortedListsByGender = roster.stream()
    .collect(Collectors.groupingBy(Person::getGender, toSortedList(Person::compareByAge)));

排序操作的行为相同(感谢Tagir Valeev纠正了我),但是您可以轻松地检查插入排序策略的执行情况.只需将收集器实现更改为:

the sort operation behaves the same (thanks to Tagir Valeev for correcting me), but you can easily check how a sort-on-insertion strategy performs. Just change the collector implementation to:

static <T> Collector<T,?,List<T>> toSortedList(Comparator<? super T> c) {
    return Collectors.collectingAndThen(
        Collectors.toCollection(()->new TreeSet<>(c)), ArrayList::new);
}

为完整起见,如果您想要一个收集器,该收集器首先将已排序的插入到ArrayList中以避免最后的复制步骤,则可以使用更复杂的收集器,例如:

For completeness, if you want a collector which inserts sorted into an ArrayList in the first place to avoid the final copy step, you can use a more elaborated collector like this:

static <T> Collector<T,?,List<T>> toSortedList(Comparator<? super T> c) {
    return Collector.of(ArrayList::new,
        (l,t) -> {
            int ix=Collections.binarySearch(l, t, c);
            l.add(ix<0? ~ix: ix, t);
        },
        (list1,list2) -> {
            final int s1=list1.size();
            if(list1.isEmpty()) return list2;
            if(!list2.isEmpty()) {
                list1.addAll(list2);
                if(c.compare(list1.get(s1-1), list2.get(0))>0)
                    list1.sort(c);
            }
            return list1;
        });
}

对于顺序使用效率很高,但合并功能并非最佳选择.底层的排序算法将从预排序的范围中受益,但是尽管我们的合并功能实际上知道这些范围,但必须首先找到这些范围.不幸的是,JRE中没有公共API允许我们利用这些信息(有效;我们可以将subList传递给binarySearch,但是为list2的每个元素创建一个新的子列表可能会变得太昂贵了) .如果要进一步提高并行执行的性能,则必须重新实现排序算法的合并部分:

It’s efficient for the sequential usage but its merge function is not optimal. The underlying sort algorithm will benefit from presorted ranges but has to find these ranges first despite our merge function actually knows these ranges. Unfortunately, there’s no public API in the JRE allowing us to utilize these information (efficiently; we can pass subLists to binarySearch but creating a new sub list for each element of list2 may turn out to be too expensive). If we want to raise the performance of the parallel execution further, we have to re-implement the merge part of the sorting algorithm:

static <T> Collector<T,?,List<T>> toSortedList(Comparator<? super T> c) {
    return Collector.of(ArrayList::new,
        (l,t) -> l.add(insertPos(l, 0, l.size(), t, c), t),
        (list1,list2) -> merge(list1, list2, c));
}
static <T> List<T> merge(List<T> list1, List<T> list2, Comparator<? super T> c) {
    if(list1.isEmpty()) return list2;
    for(int ix1=0, ix2=0, num1=list1.size(), num2=list2.size(); ix2<num2; ix2++, num1++) {
        final T element = list2.get(ix2);
        ix1=insertPos(list1, ix1, num1, element, c);
        list1.add(ix1, element);
        if(ix1==num1) {
            while(++ix2<num2) list1.add(list2.get(ix2));
            return list1;
        }
    }
    return list1;
}
static <T> int insertPos(
    List<? extends T> list, int low, int high, T t, Comparator<? super T> c) {
    high--;
    while(low <= high) {
        int mid = (low+high)>>>1, cmp = c.compare(list.get(mid), t);
        if(cmp < 0) low = mid + 1;
        else if(cmp > 0) high = mid - 1;
        else {
            mid++;
            while(mid<=high && c.compare(list.get(mid), t)==0) mid++;
            return mid;
        }
    }
    return low;
}

请注意,与基于简单binarySearch的简单插入不同,最后一种解决方案是一种稳定的排序实现,即,在您的情况下,具有相同年龄且GenderPerson不会改变它们的相对顺序,如果源流具有已定义的遇到顺序.

Note that this last solution, unlike the simple binarySearch based insertion, is a stable sort implementation, i.e. in your case, Persons with the same age and Gender won’t change their relative order, if the source stream has a defined encounter order.

这篇关于分组后对列表进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆