java.lang.String 是否有内存高效的替代品? [英] Is there a memory-efficient replacement of java.lang.String?

查看:24
本文介绍了java.lang.String 是否有内存高效的替代品?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读这篇旧文章后测量内存几种对象类型的消耗,我很惊讶地看到 String 在 Java 中使用了多少内存:

After reading this old article measuring the memory consumption of several object types, I was amazed to see how much memory Strings use in Java:

length: 0, {class java.lang.String} size = 40 bytes
length: 7, {class java.lang.String} size = 56 bytes

虽然这篇文章有一些技巧可以最大限度地减少这种情况,但我发现它们并不完全令人满意.使用 char[] 来存储数据似乎很浪费.大多数西方语言的明显改进是使用 byte[] 和类似 UTF-8 的编码,因为您只需要一个字节来存储最常用的字符,而不是两个字节.

While the article has some tips to minimize this, I did not find them entirely satisfying. It seems to be wasteful to use char[] for storing the data. The obvious improvement for most western languages would be to use byte[] and an encoding like UTF-8 instead, as you only need a single byte to store the most frequent characters then instead of two bytes.

当然可以使用 String.getBytes("UTF-8")new String(bytes, "UTF-8").甚至 String 实例本身的开销也会消失.但是这样你就失去了非常方便的方法,比如 equals(), hashCode(), length(), ...

Of course one could use String.getBytes("UTF-8") and new String(bytes, "UTF-8"). Even the overhead of the String instance itself would be gone. But then there you lose very handy methods like equals(), hashCode(), length(), ...

Sun 拥有关于字符串的 byte[] 表示的专利,据我所知.

Sun has a patent on byte[] representation of Strings, as far as I can tell.

在 Java 编程环境中有效表示字符串对象的框架
... 在适当的时候,可以实现这些技术以将 Java 字符串对象创建为一字节字符的数组...

Frameworks for efficient representation of string objects in Java programming environments
... The techniques can be implemented to create Java string objects as arrays of one-byte characters when it is appropriate ...

但我没有找到该专利的 API.

But I failed to find an API for that patent.

我为什么在乎?
在大多数情况下,我不会.但是我研究了具有大量缓存的应用程序,其中包含大量字符串,这会因更有效地使用内存而受益.

Why do I care?
In most cases I don't. But I worked on applications with huge caches, containing lots of Strings, which would have benefitted from using the memory more efficiently.

有人知道这样的 API 吗?或者是否有另一种方法可以让 Strings 的内存占用保持较小,即使以 CPU 性能或更丑陋的 API 为代价?

Does anybody know of such an API? Or is there another way to keep your memory footprint for Strings small, even at the cost of CPU performance or uglier API?

请不要重复以上文章中的建议:

Please don't repeat the suggestions from the above article:

  • 自己的 String.intern() 变体(可能带有 SoftReferences)
  • 存储单个 char[] 并利用当前的 String.subString(.) 实现来避免数据复制(讨厌)
  • own variant of String.intern() (possibly with SoftReferences)
  • storing a single char[] and exploiting the current String.subString(.) implementation to avoid data copying (nasty)

更新

我在 Sun 当前的 JVM (1.6.0_10) 上运行了文章中的代码.它产生了与 2002 年相同的结果.

I ran the code from the article on Sun's current JVM (1.6.0_10). It yielded the same results as in 2002.

推荐答案

借助 JVM 的一点点帮助...

警告:此解决方案现已在较新的 Java SE 版本中过时.在下面进一步查看其他临时解决方案.

With a Little Bit of Help From the JVM...

WARNING: This solution is now obsolete in newer Java SE versions. See other ad-hoc solutions further below.

如果您使用 HotSpot JVM,自 Java 6 更新 21 起,您可以使用此命令行选项:

If you use an HotSpot JVM, since Java 6 update 21, you can use this command-line option:

-XX:+UseCompressedStrings

JVM 选项 页面读取:

对可以表示为纯 ASCII 的字符串使用 byte[].(介绍在 Java 6 Update 21 性能版本中)

Use a byte[] for Strings which can be represented as pure ASCII. (Introduced in Java 6 Update 21 Performance Release)

更新:此功能在更高版本中被破坏,应该在 Java SE 6u25 中再次修复,如 6u25 b03 发行说明(但是我们在 6u25 最终发行说明).出于安全原因,错误报告 7016213 不可见.因此,请谨慎使用并先检查.像任何 -XX 选项一样,它被认为是实验性的并且可能会在没有太多通知的情况下发生变化,因此在生产服务器的启动脚本中不使用它可能并不总是最好的.

UPDATE: This feature was broken in a later version and was supposed to be fixed again in Java SE 6u25 as mentioned by the 6u25 b03 release notes (however we don't see it in the 6u25 final release notes). The bug report 7016213 is not visible for security reasons. So, use with care and check first. Like any -XX option, it is deemed experimental and subject to change without much notice, so it's probably not always best to not use that in the startup scrip of a production server.

UPDATE 2013-03 (感谢 Aleksey Maximus):请参阅此相关问题其已接受的答案.现在这个选项似乎已经死了.这在错误 7129417 报告中得到了进一步证实.

UPDATE 2013-03 (thanks to a comment by Aleksey Maximus): See this related question and its accepted answer. The option now seems to be deceased. This is further confirmed in the bug 7129417 report.

警告:针对特定需求的(丑陋)解决方案

Warning: (Ugly) Solutions for Specific Needs

这有点开箱即用且级别较低,但既然你问了......不要打信使!

如果 ASCII 可以满足您的需求,那么您为什么不推出自己的实现呢?

If ASCII is fine for you needs, then why don't you just roll out your own implementation?

正如您所提到的,您可以在内部使用 byte[] 而不是 char[].但这还不是全部.

As you mentioned, you could byte[] instead of char[] internally. But that's not all.

要做到更轻量级,与其将字节数组包装在一个类中,为什么不简单地使用一个包含主要静态方法的辅助类,这些方法对您传递的这些字节数组进行操作?当然,它会让人感觉非常 C 语言,但它会起作用,并且会为您节省 String 对象带来的巨大开销.

To do it even more lightweight, instead of wrapping your byte arrays in a class, why not simply use an helper class containing mostly static methods operating on these byte arrays that you pass around? Sure, it's going to feel pretty C-ish, but it would work, and would save you the huge overhead that goes with String objects.

当然,它会错过一些不错的功能……除非您重新实现它们.如果你真的需要它们,那么没有太多选择.多亏了 OpenJDK 和许多其他优秀项目,您可以很好地推出您自己的仅对 byte[] 参数进行操作的笨拙的 LiteStrings 类.每次需要调用函数时,您都会感觉像是在洗澡,但您会节省大量内存.

And sure, it would miss some nice functionalities... unless your re-implement them. If you really need them, then there's not much choice. Thanks to OpenJDK and a lot of other good projects, you could very well roll out your own fugly LiteStrings class that just operate on byte[] parameters. You'll feel like taking a shower every time you need to call a function, but you'll have saved heaps of memory.

我建议让它与 String 类的契约非常相似,并提供有意义的适配器和构建器来转换 String,你可能还想有与 StringBufferStringBuilder 之间的适配器,以及一些你可能需要的其他东西的镜像实现.绝对是一些工作,但可能值得(请参阅让它算数!"部分下方的内容).

I'd recommend to make it resemble closely the String class's contract and to provide meaningful adapters and builders to convert from and to String, and you might want to also have adapters to and from StringBuffer and StringBuilder, as well as some mirror implementations of other things you might need. Definitely some piece of work, but might be worth it (see a bit below the "Make it Count!" section).

您可以很好地在内存中压缩字符串,并在需要时即时解压缩它们.毕竟,您只需要在访问它们时才能读取它们,对吗?

You could very well compress your strings in memory and decompress them on the fly when you need them. After all, you only need to be able to read them when you access them, right?

当然,那么暴力意味着:

Of course, being that violent will mean:

  • 更复杂(因此不易维护)的代码,
  • 处理能力更强,
  • 需要相对较长的字符串才能使压缩相关(或通过实现您自己的存储系统将多个字符串压缩为一个,以使压缩更有效).

对于完全头痛,你当然可以做到所有这些:

For a full-headache, of course you can do all of that:

  • C-ish 辅助类,
  • 字节数组,
  • 即时压缩存储.

一定要开源.:)

顺便说一下,请参阅 N. Mitchell 和 G. Sevitsky 关于构建内存高效的 Java 应用程序的精彩演示:[2008 版],[2009 版].

By the way, see this great presentation on Building Memory-Efficient Java Applications by N. Mitchell and G. Sevitsky: [2008 version], [2009 version].

从本演示文稿中,我们看到 8 字符的字符串在 32 位系统上占用 64 个字节(64 位系统为 96 个!!),其中大部分是由于JVM 开销.从这篇文章我们看到一个8字节的数组将吃只"24 字节:12 字节的头部,8 x 1 字节 + 4 字节的对齐).

From this presentation, we see that an 8-char string eats 64 bytes on a 32-bit system (96 for a 64-bit system!!), and most of it is due to JVM overhead. And from this article we see that an 8-byte array would eat "only" 24 bytes: 12 bytes of header, 8 x 1 byte + 4 bytes of alignment).

如果你真的操作了很多这样的东西,这听起来可能是值得的(并且可能会加快速度,因为你会花更少的时间分配内存,但不要引用我的话并对其进行基准测试;此外,这在很大程度上取决于您的实施).

Sounds like this could be worth it if you really manipulate a lot of that stuff (and possibly speed up things a bit, as you'd spend less time allocating memory, but don't quote me on that and benchmark it; plus it would depend greatly on your implementation).

这篇关于java.lang.String 是否有内存高效的替代品?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆