是否有一个内存有效的java.lang.String替换? [英] Is there a memory-efficient replacement of java.lang.String?

查看:137
本文介绍了是否有一个内存有效的java.lang.String替换?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

阅读这篇旧文章测量记忆消耗了几种对象类型,我很惊讶地看到在Java中使用多少内存 String

  length:0,{class java.lang.String} size = 40 bytes 
length:7,{class java.lang.String} size = 56 bytes

虽然文章有一些提示可以最大限度地减少这种情况,但我并没有发现它们完全令人满意。使用 char [] 来存储数据似乎很浪费。大多数西方语言的明显改进是使用 byte [] 和类似UTF-8的编码,因为你只需要一个字节来存储最频繁的字符然后而不是两个字节。



当然可以使用 String.getBytes(UTF-8) new String(bytes,UTF-8)。甚至String实例本身的开销也会消失。但是那时你失去了非常方便的方法,比如 equals() hashCode() length (),...



Sun有 byte [] 表示字符串的rel =noreferrer>专利,据我所知。


在Java编程环境中高效表示字符串对象的框架

...可以实现这些技术,以便在适当的时候将Java字符串对象创建为单字节字符的数组...


但是我失败了找到该专利的API。



我为什么要关心?

在大多数情况下,我没有。但我使用包含大量字符串的巨大缓存来处理应用程序,这些字符串可以从更有效地使用内存中受益。



有人知道这样的API吗?或者是否有另一种方法可以保持Strings的内存占用小,即使以CPU性能或更丑陋的API为代价?



请不要重复上述建议文章:




  • 自己的 String.intern()变体(可能带 SoftReferences

  • 存储单个 char [] 并利用当前的 String.subString(。)实现以避免数据复制(讨厌)



更新



我从Sun目前的JVM(1.6.0_10)上的文章中运行了代码。它产生了与2002年相同的结果。

解决方案

从JVM获得一点帮助......



警告:此解决方案现已在较新的Java SE版本中过时。请参阅下面的其他临时解决方案。



如果您使用HotSpot JVM,那么从Java 6更新21开始,您可以使用此命令行选项:

  -XX:+ UseCompressedStrings 

JVM Options 页面读取:


对字符串使用byte [],可以表示为纯ASCII。 (在Java 6 Update 21性能发布中引入


更新:此功能已被破坏在后来的版本中,应该在Java SE 6u25中再次修复,如 6u25 b03发行说明(但我们在 6u25最终发行说明)。出于安全原因,错误报告7016213 不可见。因此,请务必小心使用并先检查。与任何 -XX 选项一样,它被认为是实验性的,可能会在没有太多通知的情况下进行更改,因此在生产服务器的启动脚本中不使用它可能并不总是最好的。



更新2013-03 (感谢 Aleksey Maximus :见相关问题其接受的答案。这个选项现在似乎已经死了。 7129417 报告中的错误进一步证实了这一点。



结束证明手段



警告:(丑陋)针对特定需求的解决方案



这是一个开箱即用的低级别,但是因为你问...不要打信使! / em>



你自己的打火机字符串表示



如果你需要ASCII就好,那么为什么要'你刚推出自己的实现吗?



如你所说,你可以 byte [] 而不是 char [] 内部。但这不是全部。



为了做到更轻量级,而不是将字节数组包装在一个类中,为什么不简单地使用一个主要包含静态方法的辅助类你传递的这些字节数组?当然,它会感觉很漂亮C-ish,但它会起作用,并且可以节省巨大的开销,与 String 对象相关。



当然,它会遗漏一些不错的功能......除非你重新实现它们。如果你真的需要它们,那么没有太多选择。感谢OpenJDK和许多其他好项目,你可以很好地推出你自己的 LiteStrings 类,它只能在 byte []上运行参数。每次你需要召唤一个函数时,你会觉得要洗个澡,但是你会节省大量的记忆。



我建议让它变得像密切关注 String 类的合约,并提供有意义的适配器和构建器来转换为 String ,你可能想要还有来自 StringBuffer StringBuilder 的适配器,以及您可能需要的其他一些镜像实现。绝对是一些工作,但可能是值得的(请参阅下面的让它计数!部分)。



即时压缩/解压缩



您可以很好地压缩内存中的字符串并在需要时动态解压缩它们。毕竟,你只需要能够在访问它们时阅读它们,对吗?



当然,暴力意味着:




  • 更复杂(因此维护较少)的代码,

  • 更多处理能力,

  • 需要相对较长的字符串才能使压缩相关(或者通过实现自己的存储系统将多个字符串压缩为一个,以使压缩更有效)。



两者都



当然,你可以做到这一切:




  • C-ish帮助类,

  • 字节数组,

  • 即时压缩存储。



一定要开源。 :)



让它成为可数!



顺便说一句,看到这个伟大的由N. Mitchell和G. Sevitsky撰写的关于构建内存高效的Java应用程序的演示文稿:[ 2008 version ],[ 2009版]。



从这个演示文稿中,我们看到32位系统上的 8-char字符串占用64字节(64位系统为96)!!并且大部分是到期的到JVM开销。从文章中我们看到 8字节数组吃只24字节:12字节的标题,8 x 1字节+ 4字节的对齐)。



听起来这可能是值得的,如果你真的操纵了很多东西(并且可能会加速一些事情,因为你花费更少的时间来分配内存,但是不要引用我并对它进行基准测试;加上它将在很大程度上取决于你的实现)。 / p>

After reading this old article measuring the memory consumption of several object types, I was amazed to see how much memory Strings use in Java:

length: 0, {class java.lang.String} size = 40 bytes
length: 7, {class java.lang.String} size = 56 bytes

While the article has some tips to minimize this, I did not find them entirely satisfying. It seems to be wasteful to use char[] for storing the data. The obvious improvement for most western languages would be to use byte[] and an encoding like UTF-8 instead, as you only need a single byte to store the most frequent characters then instead of two bytes.

Of course one could use String.getBytes("UTF-8") and new String(bytes, "UTF-8"). Even the overhead of the String instance itself would be gone. But then there you lose very handy methods like equals(), hashCode(), length(), ...

Sun has a patent on byte[] representation of Strings, as far as I can tell.

Frameworks for efficient representation of string objects in Java programming environments
... The techniques can be implemented to create Java string objects as arrays of one-byte characters when it is appropriate ...

But I failed to find an API for that patent.

Why do I care?
In most cases I don't. But I worked on applications with huge caches, containing lots of Strings, which would have benefitted from using the memory more efficiently.

Does anybody know of such an API? Or is there another way to keep your memory footprint for Strings small, even at the cost of CPU performance or uglier API?

Please don't repeat the suggestions from the above article:

  • own variant of String.intern() (possibly with SoftReferences)
  • storing a single char[] and exploiting the current String.subString(.) implementation to avoid data copying (nasty)

Update

I ran the code from the article on Sun's current JVM (1.6.0_10). It yielded the same results as in 2002.

解决方案

With a Little Bit of Help From the JVM...

WARNING: This solution is now obsolete in newer Java SE versions. See other ad-hoc solutions further below.

If you use an HotSpot JVM, since Java 6 update 21, you can use this command-line option:

-XX:+UseCompressedStrings

The JVM Options page reads:

Use a byte[] for Strings which can be represented as pure ASCII. (Introduced in Java 6 Update 21 Performance Release)

UPDATE: This feature was broken in a later version and was supposed to be fixed again in Java SE 6u25 as mentioned by the 6u25 b03 release notes (however we don't see it in the 6u25 final release notes). The bug report 7016213 is not visible for security reasons. So, use with care and check first. Like any -XX option, it is deemed experimental and subject to change without much notice, so it's probably not always best to not use that in the startup scrip of a production server.

UPDATE 2013-03 (thanks to a comment by Aleksey Maximus): See this related question and its accepted answer. The option now seems to be deceased. This is further confirmed in the bug 7129417 report.

The End Justifies the Means

Warning: (Ugly) Solutions for Specific Needs

This is a bit out of the box and lower-level, but since you asked... don't hit the messenger!

Your Own Lighter String Representation

If ASCII is fine for you needs, then why don't you just roll out your own implementation?

As you mentioned, you could byte[] instead of char[] internally. But that's not all.

To do it even more lightweight, instead of wrapping your byte arrays in a class, why not simply use an helper class containing mostly static methods operating on these byte arrays that you pass around? Sure, it's going to feel pretty C-ish, but it would work, and would save you the huge overhead that goes with String objects.

And sure, it would miss some nice functionalities... unless your re-implement them. If you really need them, then there's not much choice. Thanks to OpenJDK and a lot of other good projects, you could very well roll out your own fugly LiteStrings class that just operate on byte[] parameters. You'll feel like taking a shower every time you need to call a function, but you'll have saved heaps of memory.

I'd recommend to make it resemble closely the String class's contract and to provide meaningful adapters and builders to convert from and to String, and you might want to also have adapters to and from StringBuffer and StringBuilder, as well as some mirror implementations of other things you might need. Definitely some piece of work, but might be worth it (see a bit below the "Make it Count!" section).

On-the-Fly Compression/Decompression

You could very well compress your strings in memory and decompress them on the fly when you need them. After all, you only need to be able to read them when you access them, right?

Of course, being that violent will mean:

  • more complex (thus less maintainable) code,
  • more processing power,
  • relatively long strings are needed for the compression to be relevant (or to compact multiple strings into one by implementing your own store system, to make the compression more effective).

Do Both

For a full-headache, of course you can do all of that:

  • C-ish helper class,
  • byte arrays,
  • on-the-fly compressed store.

Be sure to make that open-source. :)

Make it Count!

By the way, see this great presentation on Building Memory-Efficient Java Applications by N. Mitchell and G. Sevitsky: [2008 version], [2009 version].

From this presentation, we see that an 8-char string eats 64 bytes on a 32-bit system (96 for a 64-bit system!!), and most of it is due to JVM overhead. And from this article we see that an 8-byte array would eat "only" 24 bytes: 12 bytes of header, 8 x 1 byte + 4 bytes of alignment).

Sounds like this could be worth it if you really manipulate a lot of that stuff (and possibly speed up things a bit, as you'd spend less time allocating memory, but don't quote me on that and benchmark it; plus it would depend greatly on your implementation).

这篇关于是否有一个内存有效的java.lang.String替换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆