如果字符串是不可变的.NET,那么为什么子串取O(n)的时间? [英] If strings are immutable in .NET, then why does Substring take O(n) time?

查看:187
本文介绍了如果字符串是不可变的.NET,那么为什么子串取O(n)的时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于字符串是不可变的。NET,我不知道为什么他们被设计成 string.Substring()需要O( substring.Length )的时间,而不是 O(1)

Given that strings are immutable in .NET, I'm wondering why they have been designed such that string.Substring() takes O(substring.Length) time, instead of O(1)?

即。什么是权衡,如果有的话?

i.e. what were the tradeoffs, if any?

推荐答案

更​​新:我很喜欢这个问题这么多,我只是在博客吧。请参阅<一href="http://blogs.msdn.com/b/ericlippert/archive/2011/07/19/strings-immutability-and-persistence.aspx">Strings,永恒和持久性

UPDATE: I liked this question so much, I just blogged it. See Strings, immutability and persistence

简短的回答是: O(n)是O(1)如果n没有变大大多数人提取微小的弦微小子,因此如何在复杂性的增加渐进是的。完全不相干的。

The short answer is: O(n) is O(1) if n does not grow large. Most people extract tiny substrings from tiny strings, so how the complexity grows asymptotically is completely irrelevant.

长的答复是:

建成这样的不可变的数据结构上的一个实例许可证重新使用原始的存储器的与复制的或新分配的只有少量(典型地为O(1)或O-(LG n))的操作称为老大难不可改变的数据结构。在.NET字符串是不可变的;你的问题本质上是为什么他们不执着?

An immutable data structure built such that operations on an instance permit re-use of the memory of the original with only a small amount (typically O(1) or O(lg n)) of copying or new allocation is called a "persistent" immutable data structure. Strings in .NET are immutable; your question is essentially "why are they not persistent"?

由于当你看操作,这些操作的一般的完成在.NET程序的字符串,它是在每一个相关的方式的几乎没有更坏的简单地做一个全新的串。 的费用,并构建一个复杂的持久数据结构不支付本身的难度。

Because when you look at operations that are typically done on strings in .NET programs, it is in every relevant way hardly worse at all to simply make an entirely new string. The expense and difficulty of building a complex persistent data structure doesn't pay for itself.

人们通常用子来提取一个简短的字符串 - 也就是说,十个或二十个字符 - 出稍长字符串 - 也许几百字。你有一条线在一个逗号分隔的文件中的文本,并要提取的第三场,这是一个姓氏。该生产线将可能几百个字符,名称将是一对夫妇打。字符串分配和五十个字节的内存复印的惊人的快的现代硬件上。这使得一个新的数据结构,它由一个指针到一个现有的串加上一个长度的中间的是的的惊人快无关; 不够快,是根据定义的速度不够快。

People typically use "substring" to extract a short string -- say, ten or twenty characters -- out of a somewhat longer string -- maybe a couple hundred characters. You have a line of text in a comma-separated file and you want to extract the third field, which is a last name. The line will be maybe a couple hundred characters long, the name will be a couple dozen. String allocation and memory copying of fifty bytes is astonishingly fast on modern hardware. That making a new data structure that consists of a pointer to the middle of an existing string plus a length is also astonishingly fast is irrelevant; "fast enough" is by definition fast enough.

提取的子串通常小型且短寿命;垃圾收集器将很快收回他们,他们并没有占用太多空间就堆在首位。因此,使用该鼓励的大部分内存重用一个持久的战略也没有取胜;你所做的一切是由你的垃圾收集器越慢,因为现在它已经担心处理内部指针。

The substrings extracted are typically small in size and short in lifetime; the garbage collector is going to reclaim them soon, and they didn't take up much room on the heap in the first place. So using a persistent strategy that encourages reuse of most of the memory is also not a win; all you've done is made your garbage collector get slower because now it has to worry about handling interior pointers.

如果该子操作的人通常做了弦是完全不同的,那么这将是有意义的去与一个持久的办法。如果人们通常有百万字符串,并提取成千上万与规模在十万个字符的范围重叠的子字符串,这些子住很长一段时间在堆上,那就可以完美地去与一个持久子方法;这将是一种浪费和愚蠢不。但业务线的大多数程序员没有做任何事情,甚至隐约像那些各种各样的事情。 .NET是不是是专为人类基因组计划的需求的平台; DNA分析程序员必须要解决的问题与那些每天字符串使用特性;赔率是好的,你不知道。谁也建立自己的持久性数据结构的一些紧密匹配的及其的使用场景。

If the substring operations people typically did on strings were completely different, then it would make sense to go with a persistent approach. If people typically had million-character strings, and were extracting thousands of overlapping substrings with sizes in the hundred-thousand-character range, and those substrings lived a long time on the heap, then it would make perfect sense to go with a persistent substring approach; it would be wasteful and foolish not to. But most line-of-business programmers do not do anything even vaguely like those sorts of things. .NET is not a platform that is tailored for the needs of the Human Genome Project; DNA analysis programmers have to solve problems with those string usage characteristics every day; odds are good that you do not. The few who do build their own persistent data structures that closely match their usage scenarios.

例如,我的团队写在您键入它即时的C#和VB code分析说,做节目。其中一些code文件是巨大的,因此,我们不能做O(n)的字符串操作来提取子或插入或删除字符。我们已经建立了一堆持久不变的数据结构进行重新presenting编辑允许我们快速而有效地再利用大部分现有的字符串数据的文本缓冲区的的现有的词汇和句法分析在一个典型的编辑。这是一个很难解决的问题及其解决方案进行细化,以对C#和VB code编辑特定的域。这将是不切实际的期望内置的字符串类型来解决这个问题而生。

For example, my team writes programs that do on-the-fly analysis of C# and VB code as you type it. Some of those code files are enormous and thus we cannot be doing O(n) string manipulation to extract substrings or insert or delete characters. We have built a bunch of persistent immutable data structures for representing edits to a text buffer that permit us to quickly and efficiently re-use the bulk of the existing string data and the existing lexical and syntactic analyses upon a typical edit. This was a hard problem to solve and its solution was narrowly tailored to the specific domain of C# and VB code editing. It would be unrealistic to expect the built-in string type to solve this problem for us.

这篇关于如果字符串是不可变的.NET,那么为什么子串取O(n)的时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆