为什么string.intern()这么慢? [英] Why is string.intern() so slow?

查看:158
本文介绍了为什么string.intern()这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在有人质疑使用 string.intern()的事实之前,让我说出于内存和性能原因我需要在我的特定应用程序中使用它。 [1]



所以,直到现在我使用 String.intern()并且认为这是最有效的方式。但是,我注意到它已经成为软件的瓶颈。 [2]



然后,就在最近,我试图替换 String.intern()通过一个巨大的地图,我放置/获取字符串,以便每次获得一个唯一的实例。我预计这会慢一点......但恰恰相反!它的速度非常快!通过推送/轮询地图(实现完全相同)来替换实习生()会导致速度提高一个数量级以上。



问题是:为什么实习生()这么慢?!?那为什么不是简单地用地图备份(或者实际上只是一个定制的集合)并且速度会快得多?我很困惑。



[1]:对于不相信的人:它是自然语言处理,必须处理千兆字节的文本,因此需要避免许多实例一个相同的字符串,以避免炸毁内存和参考字符串比较足够快。



[2]:没有它(普通字符串)它是不可能的,用它,这一特定步骤仍然是计算量最大的一步



编辑:



由于这篇文章令人惊讶的兴趣,这里有一些代码来测试它:



http://pastebin.com/4CD8ac69



实习结果超过100万字符串:




  • HashMap :4秒

  • String .intern():54秒



由于避免了一些热身/ OS IO缓存和东西像这样,通过反转t重复实验两个基准的顺序:




  • String.intern():69秒

  • HashMap :3秒



如你所见,差异非常显着,超过十倍。 (使用OpenJDK 1.6.0_22 64位...但是使用sun一个导致类似的结果我认为)

解决方案

最可能的原因对于性能差异: String.intern()是一种本机方法,调用本机方法会产生大量开销。



<那么为什么它是一种原生方法呢?可能是因为它使用常量池,这是一个低级VM构造。


Before anyone questions the fact of using string.intern() at all, let me say that I need it in my particular application for memory and performance reasons. [1]

So, until now I used String.intern() and assumed it was the most efficient way to do it. However, I noticed since ages it is a bottleneck in the software. [2]

Then, just recently, I tried to replace the String.intern() by a huge map where I put/get the strings in order to obtain each time a unique instance. I expected this would be slower... but it was exactly the opposite! It was tremendously faster! Replacing the intern() by pushing/polling a map (which achieves exactly the same) resulted in more than one order of magnitude faster.

The question is: why is intern() so slow?!? Why isn't it then simply backed up by a map (or actually, just a customized set) and would be tremendously faster? I'm puzzled.

[1]: For the unconvinced ones: It is in natural language processing and has to process gigabytes of text, therefore needs to avoid many instances of a same string to avoid blowing up the memory and referential string comparison to be fast enough.

[2]: without it (normal strings) it is impossible, with it, this particular step remains the most computation intensive one

EDIT:

Due to the surprising interest in this post, here is some code to test it out:

http://pastebin.com/4CD8ac69

And the results of interning a bit more than 1 million strings:

  • HashMap: 4 seconds
  • String.intern(): 54 seconds

Due to avoid some warm-up / OS IO caching and stuff like this, the experiment was repeated by inverting the order of both benchmarks:

  • String.intern(): 69 seconds
  • HashMap: 3 seconds

As you see, the difference is very noticeable, more than tenfolds. (Using OpenJDK 1.6.0_22 64bits ...but using the sun one resulted in similar results I think)

解决方案

Most likely reason for the performance difference: String.intern() is a native method, and calling a native method incurs massive overhead.

So why is it a native method? Probably because it uses the constant pool, which is a low-level VM construct.

这篇关于为什么string.intern()这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆