Java中按值映射自动排序 [英] Automatically sorted by values map in Java

查看:19
本文介绍了Java中按值映射自动排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 Java 中有一个自动按值排序的映射 - 以便在我添加新的键值对或更新现有的键值对,甚至删除一些条目.

还请记住,这张地图会非常大(大小为 100 万个,甚至数百万个条目).

所以基本上我正在寻找以下功能:

假设我们有一个实现上述功能的类SortedByValuesMap"我们有以下代码:

SortedByValuesMapsorted_map = new SortedByValuesMap();sorted_map.put("苹果", 4);sorted_map.put("橙子", 2);sorted_map.put("香蕉", 1);sorted_map.put("柠檬", 3);sorted_map.put("香蕉", 6);for (String key : sorted_map.keySet()) {System.out.println(key + ":" + sorted_map.get(key));}

输出应该是:

香蕉:6苹果:4柠檬:3橙子:2

特别是,对我来说真正重要的是能够通过任何时候的最低值 - 使用如下命令:

smallestItem = sorted_map.lastEntry();

这应该给我橙子"条目

我是 Java 新手,所以请在您的答案中详细说明 - 谢谢

这可能有帮助:我用它来计算巨大文本文件中的单词(对于那些熟悉的人:尤其是 n-gram).所以我需要构建一个地图,其中键是单词,值是这些单词的频率.但是,由于限制(如 RAM),我只想保留 X 个最常用的词 - 但您当然无法事先知道哪些将是最常用的词.所以,我认为它可能工作的方式(作为近似值)是开始计算单词,当地图达到最高限制(比如 1 百万个条目)时,最不频繁的条目将被删除,以保持地图的大小总是 100 万.

解决方案

保持 2 个数据结构:

  • 单词词典 -> 计数.只需使用普通的HashMap.
  • 一个用于跟踪顺序的数组",这样 list[count] 保存了一个 Set 具有该计数的单词.>

    为了符号方便,我把它写成一个数组.事实上,您可能不知道出现次数的上限,因此您需要一个可调整大小的数据结构.使用 Map> 实现.或者,如果使用太多内存,请使用 ArrayList>(您必须测试 count == size() - 1,如果是这样,请使用 add() 而不是 set(count + 1)).

增加一个词的出现次数(伪代码):

//假设数据结构在实例变量 dict 和 arr 中public void Tally(最终字符串字){最终长计数 = this.dict.get(word) 或 0 如果不存在;this.dict.put(word, count + 1);//将单词在 arr 中移动一位this.arr[count].remove(word);//这就是为什么我们在这里使用 Set: 来快速删除.this.arr[count + 1].add(word);}

按顺序遍历单词(伪代码):

for(int count = 0; count 

I need to have an automatically sorted-by-values map in Java - so that It keeps being sorted at any time while I'm adding new key-value pairs or update the value of an existing key-value pair, or even delete some entry.

Please also have in mind that this map is going to be really big (100's of thousands, or even 10's of millions of entries in size).

So basically I'm looking for the following functionality:

Supposed that we had a class 'SortedByValuesMap' that implements the aforementioned functionality and we have the following code:

SortedByValuesMap<String,Long> sorted_map = new SortedByValuesMap<String, Long>();
sorted_map.put("apples", 4);
sorted_map.put("oranges", 2);
sorted_map.put("bananas", 1);
sorted_map.put("lemons", 3);
sorted_map.put("bananas", 6);

for (String key : sorted_map.keySet()) {
  System.out.println(key + ":" + sorted_map.get(key));
}

the output should be:

bananas:6
apples:4
lemons:3
oranges:2

In particular, what's really important for me, is to be able to get the entry with the lowest value at any time - using a command like:

smallestItem = sorted_map.lastEntry();

which should give me the 'oranges' entry

EDIT: I am a Java newbie so please elaborate a bit in your answers - thanks

EDIT2: This might help: I am using this for counting words (for those who are familiar: n-grams in particular) in huge text files. So I need to build a map where keys are words and values are the frequencies of those words. However, due to limitations (like RAM), I want to keep only the X most frequent words - but you can't know beforehand which are going to be the most frequent words of course. So, the way I thought it might work (as an approximation) is to start counting words and when the map reaches a top-limit (like 1 mil entries) , the least frequent entry will be deleted so as to keep the map's size to 1 mil always.

解决方案

Keep 2 data structures:

  • A dictionary of words -> count. Just use an ordinary HashMap<String, Long>.
  • An "array" to keep track of order, such that list[count] holds a Set<String> of words with that count.

    I'm writing this as though it were an array as a notational convenience. In fact, you probably don't know an upper bound on the number of occurrences, so you need a resizable data structure. Implement using a Map<Long, Set<String>>. Or, if that uses too much memory, use an ArrayList<Set<String>> (you'll have to test for count == size() - 1, and if so, use add() instead of set(count + 1)).

To increment the number of occurrences for a word (pseudocode):

// assumes data structures are in instance variables dict and arr
public void tally(final String word)
{
    final long count = this.dict.get(word) or 0 if absent;
    this.dict.put(word, count + 1);
    // move word up one place in arr
    this.arr[count].remove(word);   // This is why we use a Set: for fast deletion here.
    this.arr[count + 1].add(word);
}

To iterate over words in order (pseudocode):

for(int count = 0; count < arr.size; count++)
    for(final String word : this.arr[count])
        process(word, count);

这篇关于Java中按值映射自动排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆