为什么string.intern（）这么慢？ [英] Why is string.intern() so slow?

查看：158 发布时间：2018/12/6 12:44:20 java string performance

本文介绍了为什么string.intern（）这么慢？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在有人质疑使用 string.intern（）的事实之前，让我说出于内存和性能原因我需要在我的特定应用程序中使用它。 ^[1]

所以，直到现在我使用 String.intern（）并且认为这是最有效的方式。但是，我注意到它已经成为软件的瓶颈。 ^[2]

然后，就在最近，我试图替换 String.intern（）通过一个巨大的地图，我放置/获取字符串，以便每次获得一个唯一的实例。我预计这会慢一点......但恰恰相反！它的速度非常快！通过推送/轮询地图（实现完全相同）来替换实习生（）会导致速度提高一个数量级以上。

问题是：为什么实习生（）这么慢？！？那为什么不是简单地用地图备份（或者实际上只是一个定制的集合）并且速度会快得多？我很困惑。

[1]：对于不相信的人：它是自然语言处理，必须处理千兆字节的文本，因此需要避免许多实例一个相同的字符串，以避免炸毁内存和参考字符串比较足够快。

[2]：没有它（普通字符串）它是不可能的，用它，这一特定步骤仍然是计算量最大的一步

编辑：

由于这篇文章令人惊讶的兴趣，这里有一些代码来测试它：

http://pastebin.com/4CD8ac69

实习结果超过100万字符串：

HashMap ：4秒

String .intern（）：54秒

由于避免了一些热身/ OS IO缓存和东西像这样，通过反转t重复实验两个基准的顺序：

String.intern（）：69秒

HashMap ：3秒

如你所见，差异非常显着，超过十倍。（使用OpenJDK 1.6.0_22 64位...但是使用sun一个导致类似的结果我认为）

解决方案

最可能的原因对于性能差异： String.intern（）是一种本机方法，调用本机方法会产生大量开销。

<那么为什么它是一种原生方法呢？可能是因为它使用常量池，这是一个低级VM构造。

Before anyone questions the fact of using string.intern() at all, let me say that I need it in my particular application for memory and performance reasons. ^[1]

So, until now I used String.intern() and assumed it was the most efficient way to do it. However, I noticed since ages it is a bottleneck in the software. ^[2]

Then, just recently, I tried to replace the String.intern() by a huge map where I put/get the strings in order to obtain each time a unique instance. I expected this would be slower... but it was exactly the opposite! It was tremendously faster! Replacing the intern() by pushing/polling a map (which achieves exactly the same) resulted in more than one order of magnitude faster.

The question is: why is intern() so slow?!? Why isn't it then simply backed up by a map (or actually, just a customized set) and would be tremendously faster? I'm puzzled.

[1]: For the unconvinced ones: It is in natural language processing and has to process gigabytes of text, therefore needs to avoid many instances of a same string to avoid blowing up the memory and referential string comparison to be fast enough.

[2]: without it (normal strings) it is impossible, with it, this particular step remains the most computation intensive one

EDIT:

Due to the surprising interest in this post, here is some code to test it out:

http://pastebin.com/4CD8ac69

And the results of interning a bit more than 1 million strings:

HashMap: 4 seconds
String.intern(): 54 seconds

Due to avoid some warm-up / OS IO caching and stuff like this, the experiment was repeated by inverting the order of both benchmarks:

String.intern(): 69 seconds
HashMap: 3 seconds

As you see, the difference is very noticeable, more than tenfolds. (Using OpenJDK 1.6.0_22 64bits ...but using the sun one resulted in similar results I think)

解决方案

Most likely reason for the performance difference: String.intern() is a native method, and calling a native method incurs massive overhead.

So why is it a native method? Probably because it uses the constant pool, which is a low-level VM construct.

这篇关于为什么string.intern（）这么慢？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么string.intern（）这么慢？ [英] Why is string.intern() so slow?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

为什么string.intern（）这么慢？ [英] Why is string.intern() so slow?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭