.NET和Java之间的子字符串操作的性能对比 [英] Comparison of substring operation performance between .NET and Java
问题描述
以字符串的子字符串是一个非常常见的字符串处理操作,但我听说,有可能是在Java和.NET平台之间的性能/实现相当大的差异。具体来说,我听说在Java中, java.lang.String中
提供的恒的时间操作串
,但在.NET中, System.String
提供的线性的性能子串
。
Taking substrings of a string is a very common string manipulation operation, but I heard that there might be considerable differences in performance/implementation between the Java and .NET platform. Specifically I heard that in Java, java.lang.String
offers constant time operation for substring
, but in .NET, System.String
offers linear performance Substring
.
难道这些真的是这样吗?可以在文档/源$ C $ C,等这个确认?这是具体实施,或由语言和/或平台指定?什么是每种方法的利弊?应该怎样从一个平台一个人迁移到另一个样子的,以避免陷入任何性能缺陷?
Are these really the case? Can this be confirmed in the documentation/source code, etc? Is this implementation specific, or specified by the language and/or platform? What are the pros and cons of each approach? What should a person migrating from one platform to another look for to avoid falling into any performance pitfalls?
推荐答案
在.NET中,子串
是O(n),而不是Java的O(1)。这是因为在.NET中,String对象包含了所有实际的字符数据本身 1 - 所以服用一个子涉及到新的子内复制所有数据。在Java中,子
可以只创建指的是原来的字符数组的新对象,用不同的起始索引和长度。
In .NET, Substring
is O(n) rather than the O(1) of Java. This is because in .NET, the String object contains all the actual character data itself1 - so taking a substring involves copying all the data within the new substring. In Java, substring
can just create a new object referring to the original char array, with a different starting index and length.
有每一种方法的优点和缺点:
There are pros and cons of each approach:
- 在.NET的方法具有更好的高速缓存一致性,产生更少的对象 2 ,避免了其中一个小的子prevents一个非常大的情况
的char []
被垃圾收集。我相信,在某些情况下,它可以让互操作很容易的事,在内部。 - 在Java的方法使服用子非常有效的,而且可能是一些其他的操作太
- .NET's approach has better cache coherency, creates fewer objects2, and avoids the situation where one small substring prevents a very large
char[]
being garbage collected. I believe in some cases it can make interop very easy too, internally. - Java's approach makes taking a substring very efficient, and probably some other operations too
有一个在我弦文章更多的细节。
There's a little more detail in my strings article.
至于避免性能缺陷的一般问题,我想我应该有一个固定的答案准备好剪切和粘贴:确保您的建筑的是有效率的,而在最可读的方式实现它,你能够。衡量性能,并优化你找到瓶颈。
As for the general question of avoiding performance pitfalls, I think I should have a canned answer ready to cut and paste: make sure your architecture is efficient, and implement it in the most readable way you can. Measure the performance, and optimise where you find bottlenecks.
1 顺便说一句,这使得字符串
很特别 - 这是其内存占用相同的CLR中由各个实例中唯一的非数组类型。
1 Incidentally, this makes string
very special - it's the only non-array type whose memory footprint varies by instance within the same CLR.
2 对于小弦,这是一个巨大的胜利。这是糟糕透了,有就是所有的开销的一个的对象,但是当有参与以及一个额外的阵列,单字符字符串可能需要在Java中大约36个字节。 (这是一个手指在空中数量 - 。我不记得确切的对象的开销也将取决于您使用的VM)
2 For small strings, this is a big win. It's bad enough that there's all the overhead of one object, but when there's an extra array involved as well, a single-character string could take around 36 bytes in Java. (That's a "finger-in-the-air" number - I can't remember the exact object overheads. It will also depend on the VM you're using.)
这篇关于.NET和Java之间的子字符串操作的性能对比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!