Java:排序的集合,允许重复,是高效的内存,并提供快速插入+更新 [英] Java: Sorted collection which allows duplicates, is memory efficient and provides fast insert + update

查看:136
本文介绍了Java:排序的集合,允许重复,是高效的内存,并提供快速插入+更新的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

具体来说,我需要一个使用一个字段A进行访问的集合,而另一个字段(A)用于排序,但是一个接收重复的排序集合就足够了。

Specifically I need a collection which uses one field A for accessing and a different one (field S) for sorting but a sorted collection which accepts duplicate would be sufficient.

我经常来到这一点,我需要这个集合,TreeMap不是一个选项,因为它不允许重复。所以现在是时候在这里问。有一些解决方法如stackoverflow 此处这里 - 即有:

I often come to this point where I need exactly this collection and TreeMap is not an option as it does not allow duplicates. So now it is time to ask here. There are several workarounds as pointed out on stackoverflow here and here - namely there are:


  • PriorityQueue :slow update (Object)+ add(Object))和原始键的拳击

  • 斐波纳契堆:内存浪费(?)

  • TreeMap< Field_S,List< Value>> :我的问题是列表的内存开销, / li>
  • 排序列表或数组:问题是慢插入和删除 - >我应该实现一个分段排序列表吗?

  • TreeMultimap 来自番石榴( docs ):外部依赖和可能内存无效(?)

  • PriorityQueue: slow update (remove(Object) + add(Object)), and boxing of primitive keys
  • Fibonacci heap: memory waste (?)
  • TreeMap<Field_S, List<Value>>: problem for me is the memory overhead of the list, and boxing of primitive keys
  • sorted list or array: problem is the slow insert and remove -> should I implement one segmented sorted list?
  • TreeMultimap from guava (docs): external dependency and probably memory inefficient (?)

任何人有更好的建议?或者我应该扮演我自己的排序数据结构(哪一个?)?还有其他来源(在Java中,开放源码,单元测试和小程序)将是很好的。

Anyone with better suggestions? Or should I role my own sorted datastructure (which one?)? Also other sources (in Java, open source, with unit tests and small deps) would be nice.

更新

目前有关我用例的更多细节(尽管上次有类似的需求)。我有一个收藏(数百万)的参考文献,我想要能够

More details on my use case at the moment (although I'm having similar demand in the last time). I have a collection (with millions) of references where I want to be able


  • 轮询或获取关于字段S的最小元素

  • ,并在字段A的帮助下更新字段S

  • 字段S的相同值可能会发生。字段A实际上是指向另一个数组的整数

  • 唯一的依赖关系是trove4j。如果需要,我可以使用不同的mahout集合。但不是番石榴,尽管一个很好的库,收藏品没有被调整为记忆效率(拳击/拆箱)。

所以所有的哭泣一个fibonacci堆,但我担心它有太多的开销每个元素 - >这是我想到一个更多的内存有效的排序+分段数组解决方案的原因。

So all cries for a fibonacci heap but I fear it has too many overhead per element -> that was the reason I thought about a more memory efficient "sorted+segmented array" solution.

推荐答案

我决定滚动自己,但不是最佳解决方案,只是一个TreeMap变体。我会保持这个更新,如果我会微调这个集合关于记忆。速度已经比以前的PriorityQueue尝试好多了,因为我需要collection.remove(Object)方法(用于更新条目):

I decided to roll my own but not the optimal solution just a TreeMap variant. I'll keep this updated if I'll fine tune this collection regarding memory. Speed is already a lot better then the previous PriorityQueue attempt as I needed the collection.remove(Object) method (for updating an entry):

package com.graphhopper.coll;

import gnu.trove.iterator.TIntIterator;
import gnu.trove.set.hash.TIntHashSet;
import java.util.Map.Entry;
import java.util.TreeMap;

/**
 * A priority queue implemented by a treemap to allow fast key update. Or should we use a standard
 * b-tree?
 */
public class MySortedCollection {

    private int size;
    private int slidingMeanValue = 20;
    private TreeMap<Integer, TIntHashSet> map;

    public MySortedCollection(int size) {
        map = new TreeMap<Integer, TIntHashSet>();
    }

    void remove(int key, int value) {
        TIntHashSet set = map.get(value);
        if (set == null || !set.remove(key))
            throw new IllegalStateException("cannot remove key " + key + " with value " + value
                    + " - did you insert " + key + "," + value + " before?");
        size--;
        if (set.isEmpty())
            map.remove(value);
    }

    public void update(int key, int oldValue, int value) {
        remove(key, oldValue);
        insert(key, value);
    }

    public void insert(int key, int value) {
        TIntHashSet set = map.get(value);
        if (set == null)
            map.put(value, set = new TIntHashSet(slidingMeanValue));
//        else
//            slidingMeanValue = Math.max(5, (slidingMeanValue + set.size()) / 2);
        if (!set.add(key))
            throw new IllegalStateException("use update if you want to update " + key);
        size++;
    }

    public int peekValue() {
        if (size == 0)
            throw new IllegalStateException("collection is already empty!?");
        Entry<Integer, TIntHashSet> e = map.firstEntry();
        if (e.getValue().isEmpty())
            throw new IllegalStateException("internal set is already empty!?");
        return map.firstEntry().getKey();
    }

    public int peekKey() {
        if (size == 0)
            throw new IllegalStateException("collection is already empty!?");
        TIntHashSet set = map.firstEntry().getValue();
        if (set.isEmpty())
            throw new IllegalStateException("internal set is already empty!?");
        return set.iterator().next();
    }

    public int pollKey() {
        size--;
        if (size < 0)
            throw new IllegalStateException("collection is already empty!?");
        Entry<Integer, TIntHashSet> e = map.firstEntry();
        TIntHashSet set = e.getValue();
        TIntIterator iter = set.iterator();
        if (set.isEmpty())
            throw new IllegalStateException("internal set is already empty!?");
        int val = iter.next();
        iter.remove();
        if (set.isEmpty())
            map.remove(e.getKey());
        return val;
    }

    public int size() {
        return size;
    }

    public boolean isEmpty() {
        return size == 0;
    }

    public int getSlidingMeanValue() {
        return slidingMeanValue;
    }

    @Override
    public String toString() {
        return "size " + size + " min=(" + peekKey() + "=>" + peekValue() + ")";
    }
}

这篇关于Java:排序的集合,允许重复,是高效的内存,并提供快速插入+更新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆