为什么string.intern()这么慢? [英] Why is string.intern() so slow?
问题描述
在有人质疑使用 string.intern()
的事实之前,让我说出于内存和性能原因我需要在我的特定应用程序中使用它。 [1]
所以,直到现在我使用 String.intern()
并且认为这是最有效的方式。但是,我注意到它已经成为软件的瓶颈。 [2]
然后,就在最近,我试图替换 String.intern()
通过一个巨大的地图,我放置/获取字符串,以便每次获得一个唯一的实例。我预计这会慢一点......但恰恰相反!它的速度非常快!通过推送/轮询地图(实现完全相同)来替换实习生()
会导致速度提高一个数量级以上。
问题是:为什么实习生()
这么慢?!?那为什么不是简单地用地图备份(或者实际上只是一个定制的集合)并且速度会快得多?我很困惑。
[1]:对于不相信的人:它是自然语言处理,必须处理千兆字节的文本,因此需要避免许多实例一个相同的字符串,以避免炸毁内存和参考字符串比较足够快。
[2]:没有它(普通字符串)它是不可能的,用它,这一特定步骤仍然是计算量最大的一步
编辑:
由于这篇文章令人惊讶的兴趣,这里有一些代码来测试它:
实习结果超过100万字符串:
-
HashMap
:4秒 -
String .intern()
:54秒
由于避免了一些热身/ OS IO缓存和东西像这样,通过反转t重复实验两个基准的顺序:
-
String.intern()
:69秒 -
HashMap
:3秒
如你所见,差异非常显着,超过十倍。 (使用OpenJDK 1.6.0_22 64位...但是使用sun一个导致类似的结果我认为)
最可能的原因对于性能差异: String.intern()
是一种本机方法,调用本机方法会产生大量开销。
<那么为什么它是一种原生方法呢?可能是因为它使用常量池,这是一个低级VM构造。
Before anyone questions the fact of using string.intern()
at all, let me say that I need it in my particular application for memory and performance reasons. [1]
So, until now I used String.intern()
and assumed it was the most efficient way to do it. However, I noticed since ages it is a bottleneck in the software. [2]
Then, just recently, I tried to replace the String.intern()
by a huge map where I put/get the strings in order to obtain each time a unique instance. I expected this would be slower... but it was exactly the opposite! It was tremendously faster! Replacing the intern()
by pushing/polling a map (which achieves exactly the same) resulted in more than one order of magnitude faster.
The question is: why is intern()
so slow?!? Why isn't it then simply backed up by a map (or actually, just a customized set) and would be tremendously faster? I'm puzzled.
[1]: For the unconvinced ones: It is in natural language processing and has to process gigabytes of text, therefore needs to avoid many instances of a same string to avoid blowing up the memory and referential string comparison to be fast enough.
[2]: without it (normal strings) it is impossible, with it, this particular step remains the most computation intensive one
EDIT:
Due to the surprising interest in this post, here is some code to test it out:
And the results of interning a bit more than 1 million strings:
HashMap
: 4 secondsString.intern()
: 54 seconds
Due to avoid some warm-up / OS IO caching and stuff like this, the experiment was repeated by inverting the order of both benchmarks:
String.intern()
: 69 secondsHashMap
: 3 seconds
As you see, the difference is very noticeable, more than tenfolds. (Using OpenJDK 1.6.0_22 64bits ...but using the sun one resulted in similar results I think)
Most likely reason for the performance difference: String.intern()
is a native method, and calling a native method incurs massive overhead.
So why is it a native method? Probably because it uses the constant pool, which is a low-level VM construct.
这篇关于为什么string.intern()这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!