Java 8 Stream,如何获得前N个计数? [英] Java 8 Stream, How to get Top N count?

查看:156
本文介绍了Java 8 Stream,如何获得前N个计数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要你的建议来简化下面的代码。我有一个玩家列表,其中包含赢得的游戏ID。我想从这个列表中提取2个最佳玩家(2个匹配ID更高的玩家)
一旦提取,我必须返回初始列表进行其他操作。
我认为可以在优化或阅读方面改进此代码。如果你能帮助我。

I need your advice to simplify this code below. I have a player list with an ID of the games won. I want to extract the 2 best players from this list (the 2 players who have a better amount of match Id) Once extracted, I have to return the initial list to do other operations. I think it is possible to improve this code in terms of optimization or reading. If you can help me.

public class PlayerStatistics {
    int id
    String name;
    int idMatchWon; // key from Match

    // getter , setter
}

public static void main(String[] args) throws Exception {

    List<PlayerStatistics> _players = new ArrayList<PlayerStatistics>();

    _players.add(initialize(1,'John',4));
    _players.add(initialize(2,'Teddy',2));
    _players.add(initialize(3,'Kevin',3));

    // How to get Top 2
    List<PlayerStatistics> _top2Players = extractTop2Players(_players);
}

private List<PlayerStatistics> extractTop2Players (List<PlayerStatistics> _list) {

    List<PlayerStatistics> _topPlayers = new ArrayList<PlayerStatistics>();

    // 1. Group and count 
    Map<String, Long> _players = _list
            .stream()
            .filter(x -> (!"".equals(x.getName()) && x.getName()!= null) )
            .collect(
                    Collectors.groupingBy(
                            PlayerStatistics::getName, Collectors.counting()
                    )
            );
    ;

    // 2 Best Palyers
    Set<String> _sortedPlayers = _players.entrySet().stream()
            .sorted(Map.Entry.comparingByValue(Collections.reverseOrder()))
            .limit(2)
            .map(Entry::getKey)
            .collect(Collectors.toSet())
    ;

    // 3. Rebuild list 
    _topPlayers = _list
            .stream()
            .filter(x -> _sortedPlayers.contains(x.getName()))
            .collect(Collectors.toList())
    ;

    return _topPlayers;
}


private PlayerStatistics initialize (int id, String name, int year, int month, int won, int lost) {
    return 
        new PlayerStatistics()
            .withId(id)
            .withName(name)
            .withIdMatchWon(won)
        );
}


推荐答案

首先,让我们说明一下你的代码绝对正确。它做了需要做的事情,甚至通过使用集合进行优化。不过可以通过两种方式进一步改进:

First of all, let's state that your code is absolutely correct. It does what needs to be done and it's even optimized by using sets. It can be further improved in two ways, though:


  1. 时间复杂度:你正在排序整个数据集,其时间复杂度为 O(mlogm),其中 m 是您的初始列表的大小玩家。您立即获取列表中的 N 元素, N<< m

  1. Time complexity: you are sorting the whole dataset, which has a time complexity of O(mlogm), with m being the size of your initial list of players. Immediately, you are taking the top N elements of your list, with N << m.

下面我展示了一种将算法的时间复杂度提高到 O(mlogN)<的方法/ code>,这意味着在你的特定情况下它会变成 O(m)(这是因为 N = 2 ,所以 logN = log2 = 1 )。

Below I'm showing a way to improve time complexity of the algorithm to O(mlogN), which means that in your specific case it would become O(m) (this is because N=2, so logN=log2=1).

您正在遍历数据集3时间:首先你要迭代玩家列表以创建计数地图,然后你正在迭代这个地图以获得一个顶部 N 玩家的集合,最后你'再次重复播放器列表,检查每个玩家是否属于顶级 N 玩家。

You are traversing the dataset 3 times: first you're iterating the list of players to create the map of counts, then you are iterating this map to get a set with the top N players, and finally you're iterating the list of players again to check whether each player belongs to the set of top N players.

这可以改进,只对数据集执行2次传递:第一次创建计数图(类似于您已经完成的),另一次创建一个只保留顶部<$ c $的结构c> N 元素,按递减次数排序,结果准备好在遍历完成后返回。

This can be improved to perform only 2 passes over the dataset: the first one to create a map of counts (similar to what you've already done) and the other one to create a structure that will keep only the top N elements, sorted by count descending, with the result ready to be returned once the traversal has finished.

重要提示:以下解决方案要求您的 PlayerStatistics 类实现 hashCode 等于方法一致。

Important: the solution below requires that your PlayerStatistics class implements the hashCode and equals methods consistently.

首先我们有一个通用方法 topN (不出意外)提取顶部 N 来自任何给定地图的元素。它通过按值比较它的条目,降序来实现这一点(在此版本中,值 V 必须可比较< V> ,但通过提供自定义比较器< V> Comparable< V> 的值c $ c>):

First we have a generic method topN that (not surprisingly) extracts the top N elements from any given map. It does this by comparing its entries by value, descending (in this version, values V must be Comparable<V>, but this algorithm can be easily extended to support values that don't implement Comparable<V> by providing a custom Comparator<V>):

public static 
<K, V extends Comparable<? super V>, T extends Comparable<? super T>>
Collection<K> 
topN(
        Map<K, V> map, 
        int N,
        Function<? super K, ? extends T> tieBreaker) {

    TreeMap<Map.Entry<K, V>, K> topN = new TreeMap<>(
        Map.Entry.<K, V>comparingByValue()      // by value descending, then by key
            .reversed()                         // to allow entries with duplicate values
            .thenComparing(e -> tieBreaker.apply(e.getKey())));

    map.entrySet().forEach(e -> {
      topN.put(e, e.getKey());
      if (topN.size() > N) topN.pollLastEntry();
    });

    return topN.values();
}

这里 topN TreeMap 表现为优先级队列,大小 N (虽然我们加起来是 N + 1 元素)。首先我们将条目放入 topN 地图中,然后,如果地图有超过 N 条目,我们会立即调用 pollLastEntry 方法,删除优先级最低的条目(根据 TreeMap 的键的顺序)。这保证了在遍历时, topN 地图将只包含已经排序的顶部 N 条目。

Here the topN TreeMap behaves as a priority queue of size N (though we add up to N+1 elements). First we put the entry into the topN map, then, if the map has more than N entries, we immediately invoke the pollLastEntry method on it, which removes the entry with the lowest priority (according to the order of the keys of the TreeMap). This guarantees that upon traversal, the topN map will only contain the top N entries, already sorted.

请注意,我使用的比较器首先对 TreeMap< Map.Entry< K,V>,K> 进行排序值 V 按降序排列,然后按键 K 。这是在功能<?的帮助下实现的。超级K,?延伸T> tieBreaker 函数,它将每个键 K 转换为值 T ,必须可比< T> 。所有这些都允许地图包含重复值 V 的条目,而不需要 K 可比较< K>

Note that I'm using a comparator that first sorts the TreeMap<Map.Entry<K, V>, K> by values V in descending order, and then by keys K. This is achieved with the help of the Function<? super K, ? extends T> tieBreaker function, which transforms each key K to a value T that must be Comparable<T>. All this allows the map to contain entries with duplicate values of V, without requiring keys K to also be Comparable<K>.

最后,您将使用以下方法:

Finally, you'd use the above method as follows:

Map<PlayerStatistics, Long> counts = yourInitialListOfPlayers.stream()
    .filter(x -> !"".equals(x.getName()) && x.getName() != null)
    .collect(Collectors.groupingBy(x -> x, Collectors.counting()));

Collection<PlayerStatistics> top2 = topN(counts, 2, PlayerStatistics::getName);

这篇关于Java 8 Stream,如何获得前N个计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆