在 Java 中增加 Map 值的最有效方法 [英] Most efficient way to increment a Map value in Java

查看:42
本文介绍了在 Java 中增加 Map 值的最有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望这个问题对于这个论坛来说不是太基础,但我们会看到.我想知道如何重构一些代码以获得更好的性能,这些代码运行了很多次.

I hope this question is not considered too basic for this forum, but we'll see. I'm wondering how to refactor some code for better performance that is getting run a bunch of times.

假设我正在创建一个词频列表,使用 Map(可能是一个 HashMap),其中每个键是一个字符串,其中包含正在计算的单词,值是一个 Integer,每次单词的标记出现时都会增加找到了.

Say I'm creating a word frequency list, using a Map (probably a HashMap), where each key is a String with the word that's being counted and the value is an Integer that's incremented each time a token of the word is found.

在 Perl 中,增加这样的值非常容易:

In Perl, incrementing such a value would be trivially easy:

$map{$word}++;

但在 Java 中,它要复杂得多.这是我目前的做法:

But in Java, it's much more complicated. Here the way I'm currently doing it:

int count = map.containsKey(word) ? map.get(word) : 0;
map.put(word, count + 1);

这当然依赖于较新的 Java 版本中的自动装箱功能.我想知道您是否可以建议一种更有效的方法来增加这样的值.避开 Collections 框架并使用其他框架是否有更好的性能原因?

Which of course relies on the autoboxing feature in the newer Java versions. I wonder if you can suggest a more efficient way of incrementing such a value. Are there even good performance reasons for eschewing the Collections framework and using a something else instead?

更新:我已经对几个答案进行了测试.见下文.

Update: I've done a test of several of the answers. See below.

推荐答案

部分测试结果

对于这个问题,我得到了很多很好的答案——谢谢大家——所以我决定运行一些测试并找出哪种方法实际上最快.我测试的五种方法是:

Some test results

I've gotten a lot of good answers to this question--thanks folks--so I decided to run some tests and figure out which method is actually fastest. The five methods I tested are these:

  • 我在中介绍的ContainsKey"方法问题
  • Aleksandar Dimitrov 建议的TestForNull"方法
  • Hank Gay 建议的AtomicLong"方法
  • jrudolph 建议的Trove"方法
  • phax.myopenid.com 建议的MutableInt"方法

这就是我所做的...

  1. 创建了五个相同的类,不同之处如下所示.每个班级都必须执行我介绍的场景中的典型操作:打开一个 10MB 的文件并将其读入,然后对文件中的所有单词进行频率计数.由于这平均只用了 3 秒,我让它执行了 10 次频率计数(不是 I/O).
  2. 对 10 次迭代的循环进行计时,但不是 I/O 操作,并基本上使用 Ian Darwin's 方法
  3. 连续进行了所有五次测试,然后又进行了三次.
  4. 平均每种方法的四个结果.

结果

我会先展示结果,下面的代码给感兴趣的人.

Results

I'll present the results first and the code below for those who are interested.

正如预期的那样,ContainsKey 方法是最慢的,所以我将给出每种方法的速度与该方法的速度的比较.

The ContainsKey method was, as expected, the slowest, so I'll give the speed of each method in comparison to the speed of that method.

  • ContainsKey: 30.654 秒(基线)
  • AtomicLong: 29.780 秒(快 1.03 倍)
  • TestForNull: 28.804 秒(快 1.06 倍)
  • Trove: 26.313 秒(快 1.16 倍)
  • MutableInt: 25.747 秒(快 1.19 倍)
  • ContainsKey: 30.654 seconds (baseline)
  • AtomicLong: 29.780 seconds (1.03 times as fast)
  • TestForNull: 28.804 seconds (1.06 times as fast)
  • Trove: 26.313 seconds (1.16 times as fast)
  • MutableInt: 25.747 seconds (1.19 times as fast)

看起来只有 MutableInt 方法和 Trove 方法明显更快,因为只有它们才能将性能提升 10% 以上.但是,如果线程是一个问题,AtomicLong 可能比其他的更有吸引力(我不太确定).我还使用 final 变量运行了 TestForNull,但差异可以忽略不计.

It would appear that only the MutableInt method and the Trove method are significantly faster, in that only they give a performance boost of more than 10%. However, if threading is an issue, AtomicLong might be more attractive than the others (I'm not really sure). I also ran TestForNull with final variables, but the difference was negligible.

请注意,我没有分析不同场景中的内存使用情况.我很乐意听取任何对 MutableInt 和 Trove 方法可能如何影响内存使用情况有深刻见解的人的意见.

Note that I haven't profiled memory usage in the different scenarios. I'd be happy to hear from anybody who has good insights into how the MutableInt and Trove methods would be likely to affect memory usage.

就我个人而言,我发现 MutableInt 方法最有吸引力,因为它不需要加载任何第三方类.因此,除非我发现它的问题,否则我最有可能采用这种方式.

Personally, I find the MutableInt method the most attractive, since it doesn't require loading any third-party classes. So unless I discover problems with it, that's the way I'm most likely to go.

这是每个方法的关键代码.

Here is the crucial code from each method.

import java.util.HashMap;
import java.util.Map;
...
Map<String, Integer> freq = new HashMap<String, Integer>();
...
int count = freq.containsKey(word) ? freq.get(word) : 0;
freq.put(word, count + 1);

TestForNull

import java.util.HashMap;
import java.util.Map;
...
Map<String, Integer> freq = new HashMap<String, Integer>();
...
Integer count = freq.get(word);
if (count == null) {
    freq.put(word, 1);
}
else {
    freq.put(word, count + 1);
}

AtomicLong

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicLong;
...
final ConcurrentMap<String, AtomicLong> map = 
    new ConcurrentHashMap<String, AtomicLong>();
...
map.putIfAbsent(word, new AtomicLong(0));
map.get(word).incrementAndGet();

宝藏

import gnu.trove.TObjectIntHashMap;
...
TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
...
freq.adjustOrPutValue(word, 1, 1);

MutableInt

import java.util.HashMap;
import java.util.Map;
...
class MutableInt {
  int value = 1; // note that we start at 1 since we're counting
  public void increment () { ++value;      }
  public int  get ()       { return value; }
}
...
Map<String, MutableInt> freq = new HashMap<String, MutableInt>();
...
MutableInt count = freq.get(word);
if (count == null) {
    freq.put(word, new MutableInt());
}
else {
    count.increment();
}

这篇关于在 Java 中增加 Map 值的最有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆