String.intern()的性能损失 [英] Performance penalty of String.intern()

查看:144
本文介绍了String.intern()的性能损失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很多人都在谈论String.intern()的性能优势,但我实际上对性能损失更感兴趣。

Lots of people talk about the performance advantages of String.intern(), but I'm actually more interested in what the performance penalty may be.

我的主要内容问题是:


  • 搜索费用:intern()判断是否存在internable字符串的时间在常量池中。该成本如何随着该池中的字符串数量而缩放?

  • 同步:显然,整个JVM共享常量池。当从多个线程一遍又一遍地调用intern()时,该池如何运行?它执行了多少锁定?性能如何与争用一起扩展?

  • Search cost: The time that intern() takes to figure out if the internable string exists in the constants pool. How does that cost scale with the number of strings in that pool?
  • Synchronization: obviously the constant pool is shared by the whole JVM. How does that pool behave when intern() is being called over and over from multiple threads? How much locking does it perform? How does the performance scale with contention?

我担心所有这些事情,因为我目前正在开发一个财务应用程序由于重复的字符串使用太多内存的问题。某些字符串基本上看起来像枚举值,并且只能存在数量有限的潜在值(例如货币名称(USD,EUR))超过一百万份。在这种情况下,String.intern()看起来很简单,但我担心每次在某处存储货币时调用intern()的同步开销。

I am concerned about all these things because I'm currently working on a financial application that has a problem of using too much memory because of duplicated Strings. Some strings basically look like enumerated values and can only have a limited number of potential values (such as currency names ("USD", "EUR")) exist in more than a million copies. String.intern() seems like a no-brainer in this case, but I'm worried about the synchronization overhead of calling intern() everytime I store a currency somewhere.

除此之外,其他一些类型的字符串可以有数百万个不同的值,但每个字符串仍然有数万个副本(例如ISIN代码)。对于这些,我担心实际上一百万字符串会基本上减慢intern()方法的速度,从而使我的应用程序陷入困境。

On top of that, some other types of strings can have millions of different values, but still have tens of thousands of copies of each (such as ISIN codes). For these, I'm concerned that interning a million string would basically slow down the intern() method so much as to bog down my application.

推荐答案

我自己做了一些基准测试。对于搜索成本部分,我决定将String.intern()与ConcurrentHashMap.putIfAbsent(s,s)进行比较。基本上,这两个方法做同样的事情,除了String.intern()是一个本机方法,它存储和读取直接在JVM中管理的SymbolTable,而ConcurrentHashMap.putIfAbsent()只是一个普通的实例方法。

I did a little bit of benchmarking myself. For the search cost part, I've decided to compare String.intern() with ConcurrentHashMap.putIfAbsent(s,s). Basically, those two methods do the same things, except String.intern() is a native method that stores and read from a SymbolTable that is managed directly in the JVM, and ConcurrentHashMap.putIfAbsent() is just a normal instance method.

您可以在 github gist 上找到基准代码(缺少更好的地方把它)。您还可以在源文件顶部的注释中找到我在启动JVM时使用的选项(以验证基准测试没有偏差)。

You can find the benchmark code on github gist (for a lack of a better place to put it). You can also find the options I used when launching the JVM (to verify that the benchmark is not skewed) in the comments at the top of the source file.

无论如何这里结果如下:

Anyway here are the results:

图例


  • count :我们尝试汇集的不同字符串的数量

  • 初始实习生:在字符串池中插入所有字符串所用的时间(以毫秒为单位)

  • 查找相同的字符串:时间以毫秒为单位,从池中再次查找每个字符串,使用与先前在池中输入的完全相同的实例

  • 查找相等的字符串:从ms中再次查找每个字符串所花费的时间,但是使用不同的实例

  • count: the number of distinct strings that we are trying to pool
  • initial intern: the time in ms it took to insert all the strings in the string pool
  • lookup same string: the time in ms it took to lookup each of the strings again from the pool, using exactly the same instance as was previously entered in the pool
  • lookup equal string: the time in ms it took to lookup each of the strings again from the pool, but using a different instance

String.intern ()

count       initial intern   lookup same string  lookup equal string
1'000'000            40206                34698                35000
  400'000             5198                 4481                 4477
  200'000              955                  828                  803
  100'000              234                  215                  220
   80'000              110                   94                   99
   40'000               52                   30                   32
   20'000               20                   10                   13
   10'000                7                    5                    7

ConcurrentHashMap.putIfAbsent()

count       initial intern   lookup same string  lookup equal string
1'000'000              411                  246                  309
  800'000              352                  194                  229
  400'000              162                   95                  114
  200'000               78                   50                   55
  100'000               41                   28                   28
   80'000               31                   23                   22
   40'000               20                   14                   16
   20'000               12                    6                    7
   10'000                9                    5                    3

搜索费用的结论:String.intern ()打电话费用惊人。它在O(n)中非常严重,其中n是池中字符串的数量。当池中的字符串数量增加时,从池中查找一个字符串的时间增长得更多(每次查找0.7微秒,10'000个字符串,每次查找40微秒,1'000'000个字符串)。

The conclusion for the search cost: String.intern() is surprisingly expensive to call. It scales extremely badly, in something of O(n) where n is the number of strings in the pool. When the number of strings in the pool grows, the amount of time to lookup one string from the pool grows much more (0.7 microsecond per lookup with 10'000 strings, 40 microseconds per lookup with 1'000'000 strings).

ConcurrentHashMap按预期进行缩放,池中的字符串数量对查找速度没有影响。

ConcurrentHashMap scales as expected, the number of strings in the pool has no impact on the speed of the lookup.

基于这个实验,我强烈建议你避免使用String.intern(),如果你要实习多个字符串。

Based on this experiment, I'd strongly suggest avoiding to use String.intern() if you are going to intern more than a few strings.

这篇关于String.intern()的性能损失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆