为String实习真的有用吗? [英] Is string interning really useful?

查看:183
本文介绍了为String实习真的有用吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于字符串和各种语言的对话而回,并串实习的话题上来。很明显的Java和.NET框架,这样做自动的所有字符串,以及几个脚本语言。从理论上讲,这样可以节省内存,因为你不结束与相同的字符串的多个副本,它可以节省时间,因为字符串相等比较是一个简单的指针比较,而不是一个O(N)通过字符串的每个字符运行。

I was having a conversation about strings and various languages a while back, and the topic of string interning came up. Apparently Java and the .NET framework do this automatically with all strings, as well as several scripting languages. Theoretically, it saves memory because you don't end up with multiple copies of the same string, and it saves time because string equality comparisons are a simple pointer comparison instead of an O(N) run through each character of the string.

但我越去想它,就越怀疑我成长的概念的好处。在我看来,认为它的好处大多是理论上的:

But the more I think about it, the more skeptical I grow of the concept's benefits. It seems to me that the advantages are mostly theoretical:

  • 首先,使用自动字符串实习,所有的字符串必须是不可变的,这让很多的字符串处理任务的难度比他们需要。 (是的,我听说过所有的论据不变性一般。这不是问题的关键。)
  • 创建一个新的字符串每次,它必须对字符串实习表,这是至少O(N)操作检查。 (修改其中N是字符串的大小,表的不是大小,因为这是令人困惑的人。)所以,除非串相等性比较新的字符串创建的比例是pretty的高,这是不可能的节省了时间净额为正值。
  • 如果字符串相等表使用强引用,该字符串将永远不会被垃圾回收时,他们不再需要,从而浪费内存。另一方面,如果该表使用弱引用,则该串类需要某种终结以除去从表中字符串,从而减慢了气相色谱过程。 (这可能是pretty的显著,根据不同的字符串实习生表是如何实现的,最糟糕的情况下,删除一个项目从一个哈希表可能需要一个O(N)的重建在某些情况下对整个表。)
  • First off, to use automatic string interning, all strings must be immutable, which makes a lot of string processing tasks harder than they need to be. (And yes, I've heard all the arguments for immutability in general. That's not the point.)
  • Every time a new string is created, it has to be checked against the string interning table, which is at least a O(N) operation. ( Where N is the size of the string, not the size of the table, since this was confusing people.) So unless the ratio of string equality comparisons to new string creation is pretty high, it's unlikely that the net time saved is a positive value.
  • If the string equality table uses strong references, the strings will never get garbage collected when they're no longer needed, thus wasting memory. On the other hand, if the table uses weak references, then the string class requires some sort of finalizer to remove the string from the table, thus slowing down the GC process. (Which could be pretty significant, depending on how the string intern table is implemented. Worst case, deleting an item from a hash table can require an O(N) rebuild of the entire table under certain circumstances.)

这是我只是想实现细节的结果。有什么我已经错过了?字符串是否实际提供实习在一般情况下,任何显著好处?

This is just the result of me thinking about implementation details. Is there something I've missed? Does string interning actually provide any significant benefits in the general case?

编辑2:所有的权利,显然我是从一个错误的premise操作。的人,我是说从来没有指出该字符串实习是可选的新创建的字符串,实际上给了很强的IM pression认为恰恰相反。感谢乔恩设置此事直。对他的另一个公认的答案。

EDIT 2: All right, apparently I was operating from a mistaken premise. The person I was talking to never pointed out that string interning was optional for newly-created strings, and in fact gave the strong impression that the opposite was true. Thanks to Jon for setting the matter straight. Another accepted answer for him.

推荐答案

没有,Java和.NET不会自动与所有的字符串做到这一点。他们(当然,Java和C#)做它的的pssed字符串EX pressions EX $ P $字节code / IL,并通过需求的<一个href="http://download.oracle.com/javase/6/docs/api/java/lang/String.html#intern%28%29"><$c$c>String.intern和<一href="http://msdn.microsoft.com/en-us/library/system.string.intern.aspx"><$c$c>String.Intern (.NET)的方法。在.NET中的确切情况很有趣,但基本上是C#编译器将保证每一个引用到一个组件内相同字符串常量最终指的是同一个字符串对象。这可以有效地在类型初始化的时间内完成,而且可以节省大量内存。

No, Java and .NET don't do it "automatically with all strings". They (well, Java and C#) do it with constant string expressions expressed in bytecode/IL, and on demand via the String.intern and String.Intern (.NET) methods. The exact situation in .NET is interesting, but basically the C# compiler will guarantee that every reference to an equal string constant within an assembly ends up referring to the same string object. That can be done efficiently at type initialization time, and can save a bunch of memory.

它的的发生每次创建一个新的字符串的时间。

It doesn't happen every time a new string is created.

(关于字符串不变性前,我来说,我的非常的高兴,字符串是不可变的。我不想让我每次收到的参数等时间去拷贝,非常感谢你多。我还没有看到它做字符串处理任务更难,要么...)

(On the string immutability front, I for one am extremely glad that strings are immutable. I don't want to have to take a copy every time I receive a parameter etc, thank you very much. I haven't seen it make string processing tasks harder, either...)

正如其他人指出,查找字符串在哈希表通常不是一个O(n)的操作,除非你是令人难以置信的不走运哈希冲突......

And as others have pointed out, looking up a string in a hash table isn't generally an O(n) operation, unless you're incredibly unlucky with hash collisions...

我个人不使用字符串实习在用户空间code;如果我需要某种字符串缓存我将创建一个的HashSet&LT;字符串&GT; 或类似的东西。在那里你希望遇到同样的字符串多次(如XML元素名称),可在各种情况下非常有用,但用一个简单的集合,你不污染全系统的高速缓存。

Personally I don't use string interning in user-land code; if I want some sort of cache of strings I'll create a HashSet<string> or something similar. That can be useful in various situations where you expect to come across the same strings several times (e.g. XML element names) but with a simple collection you don't pollute a system-wide cache.

这篇关于为String实习真的有用吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆