具有限制和自定义比较器的部分排序集合 [英] Partial sort Collection with limit and custom Comparator

查看:60
本文介绍了具有限制和自定义比较器的部分排序集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想像这样对名为imageList的ArrayList进行排序:

I want to sort an ArrayList called imageList like this:

Collections.sort(imageList, new MapComparator(Function.KEY_TIMESTAMP, "dsc"));

这很好,但现在我希望能够设置一个限制(仅显示最新的100图片,其中ArrayList未排序,因此出于性能原因,仅创建子列表是行不通的。

This works fine, but now I want to be able to set a limit (show only the newest 100 images, where the ArrayList is unsorted, so simply creating a sublist won't work) for performance reasons.

我的MapComparator类如下:

My MapComparator class looks like this:

class MapComparator implements Comparator<HashMap<String, String>>
{
    private final String key;
    private final String order;

    public MapComparator(String key, String order)
    {
        this.key = key;
        this.order = order;
    }

    public int compare(HashMap<String, String> first,
                       HashMap<String, String> second)
    {
        String firstValue = first.get(key);
        String secondValue = second.get(key);
        if(this.order.toLowerCase().contentEquals("asc"))
        {
            return firstValue.compareTo(secondValue);
        }else{
            return secondValue.compareTo(firstValue);
        }

    }
}

有人吗知道如何实现吗?
预先感谢!

Does anyone know how to implement that? Thanks in advance!

推荐答案

我不知道这种问题的正式名称,但是确实经常发生,并且通常被称为top- k 或great- k 问题。

I'm not aware of an official name for this kind of problem, but it does occur reasonably frequently, and it's often called something like a top-k or greatest-k problem.

当然,您必须处理输入中的所有元素,因为最后一个元素可能属于 top k 集中,并且直到处理完每个最后一个元素后您才知道。但是,您不必对整个输入进行排序。进行诸如排序然后获取子列表之类的操作,或使用流,先调用 sorted(),然后调用 limit()可能非常昂贵,因为使用N个输入元素,排序为O(N log N)。但是,可以通过跟踪列表中一直看到的最大 k 个元素,将时间复杂度降低到O(N)。

You certainly have to process all the elements in the input, because the last element might belong in the "top k" set and you don't know until you've processed every last element. However, you don't have to sort the entire input. Doing something like sorting and then taking a sublist, or with a stream, calling sorted() followed by limit(), can potentially be very expensive, since with N input elements, sorting is O(N log N). However, it's possible to reduce the time complexity to O(N) simply by keeping track of the greatest k elements seen so far as you run through the list.

Guava的收集器可以做到这一点: Comparators.greatest(k,比较器)

Guava has a Collector that does exactly this: Comparators.greatest(k, comparator).

如果您不想使用番石榴,建立自己的收集器或多或少都不太困难。 PriorityQueue 对于此目的非常有用。这是第一个切入点:

If you don't want to use Guava, it's not too difficult to build your own collector that's more-or-less equivalent. A PriorityQueue is quite a useful for this purpose. Here's a first cut at it:

static <T> Collector<T,PriorityQueue<T>,List<T>> topK(int k, Comparator<? super T> comp) {
    return Collector.of(
        () -> new PriorityQueue<>(k+1, comp),
        (pq, t) -> {
            pq.add(t);
            if (pq.size() > k)
                pq.poll();
        },
        (pq1, pq2) -> {
            pq1.addAll(pq2);
            while (pq1.size() > k)
                pq1.poll();
            return pq1;
        },
        pq -> {
            int n = pq.size();
            @SuppressWarnings("unchecked")
            T[] a = (T[])new Object[n];
            while (--n >= 0)
                a[n] = pq.poll();
            return Arrays.asList(a);
        },
        Collector.Characteristics.UNORDERED);
}

这使用了 PriorityQueue 作为中间数据结构。添加元素时,当队列大小超过 k 时,将修剪最小的元素。最后,将元素从队列中拉出,并以相反的顺序放入列表中,因此将结果列表从高到低排序。

This uses a PriorityQueue as an intermediate data structure. As elements are added, the smallest element is trimmed off when the queue exceeds k in size. At the end, the elements are pulled from the queue and put into a list in reverse order, so the resulting list is sorted highest to lowest.

例如,给定a List< Integer> 包含

[920, 203, 880, 321, 181, 623, 496, 576, 854, 323,
 339, 100, 795, 165, 857, 935, 555, 648, 837, 975]

一个人可以做

List<Integer> out = input.stream()
                         .collect(topK(5, Comparator.naturalOrder()));

导致

[979, 936, 890, 875, 831]






顺便说一句,通过使用 Comparator 类中的combinator方法,可以更简单地创建映射比较器。例如,假设您的输入看起来像这样:


As an aside, it's possible to create a map comparator much more simply by using the combinator methods in the Comparator class. For example, suppose your input looks like this:

    List<Map<String, String>> input =
        List.of(Map.of("name", "map1", "timestamp", "00017"),
                Map.of("name", "map2", "timestamp", "00192"),
                Map.of("name", "map3", "timestamp", "00001"),
                Map.of("name", "map4", "timestamp", "00072"),
                Map.of("name", "map5", "timestamp", "04037"));

您可以像这样通过时间戳轻松地对地图进行排序:

You can easily sort the maps by timestamp like this:

    input.stream()
         .sorted(Comparator.comparing(map -> map.get("timestamp")))
         .forEach(System.out::println);

或将它们收集到列表中,或使用 sort进行就地排序(比较器)或其他任何值。您可以通过执行以下操作来反转排序:

Or collect them into a list, or sort-in-place using sort(comparator), or whatever. You can reverse the sort by doing:

    input.stream()
         .sorted(Comparator.comparing(map -> map.get("timestamp"), Comparator.reverseOrder()))
         .forEach(System.out::println);

那么后者的输出将是:

{name=map5, timestamp=04037}
{name=map2, timestamp=00192}
{name=map4, timestamp=00072}
{name=map1, timestamp=00017}
{name=map3, timestamp=00001}

这篇关于具有限制和自定义比较器的部分排序集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆