多少钱一个StringBuilder,像C模块中增加缓冲区? [英] How much to grow buffer in a StringBuilder-like C module?

查看:99
本文介绍了多少钱一个StringBuilder,像C模块中增加缓冲区?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C,我的工作是管理一个字节的缓冲区,允许任意的数据附加到最后一个类。现在我正在寻找进入自动调整为基础数组填满使用到的realloc 来电。这应该是有意义的是谁曾经使用Java或C#的StringBuilder 任何人。我知道如何去调整大小。但是没有任何人有任何建议,与理规定,关于多少的成长与缓冲区每个大小?

In C, I'm working on a "class" that manages a byte buffer, allowing arbitrary data to be appended to the end. I'm now looking into automatic resizing as the underlying array fills up using calls to realloc. This should make sense to anyone who's ever used Java or C# StringBuilder. I understand how to go about the resizing. But does anyone have any suggestions, with rationale provided, on how much to grow the buffer with each resize?

显然,有关于浪费的空间和过多的realloc呼叫(这可能导致过度的复制)之间进行交易。我见过一些教程/那建议增加一倍的文章。如果用户设法提供一个良好的初始猜测似乎浪费。难道是值得尝试舍入到两个部分权力或调整大小的平台上多?

Obviously, there's a trade off to be made between wasted space and excessive realloc calls (which could lead to excessive copying). I've seen some tutorials/articles that suggest doubling. That seems wasteful if the user manages to supply a good initial guess. Is it worth trying to round to some power of two or a multiple of the alignment size on a platform?

是否有任何人知道什么是Java或C#引擎盖下呢?

Does any one know what Java or C# does under the hood?

推荐答案

在C#中使用将增长StringBuilder的使用的内部缓冲区策略随时间的变化。

In C# the strategy used to grow the internal buffer used by a StringBuilder has changed over time.

有解决这个问题的三种基本的策略,并且它们具有不同的性能特点。

There are three basic strategies for solving this problem, and they have different performance characteristics.

第一个基本策略是:


  • 请字符数组

  • 当你运行的空间,创造具有k多个字符的新数组,对于一些常数k。

  • 旧数组复制到新阵列,孤儿旧数组。

该战略有一些问题,其中最明显的是,它是O(n 2 )的时间,如果在建的字符串是非常大的。比方说,k是一个万字,最后一个字符串是一个万字。你最终在1000重新分配字符串,2000,3000,4000,...,因此复制1000 + 2000 + 3000 + 4000 + ... + 999000字符,概括为500十亿复制的字符的顺序!

This strategy has a number of problems, the most obvious of which is that it is O(n2) in time if the string being built is extremely large. Let's say that k is a thousand characters and the final string is a million characters. You end up reallocating the string at 1000, 2000, 3000, 4000, ... and therefore copying 1000 + 2000 + 3000 + 4000 + ... + 999000 characters, which sums to on the order of 500 billion characters copied!

这策略有认为浪费了内存量为k界优雅的属性。

This strategy has the nice property that the amount of "wasted" memory is bounded by k.

在实践这一战略中很少使用,因为那正平方的问题。

In practice this strategy is seldom used because of that n-squared problem.

第二个基本策略是


  • 请数组

  • 当你运行的空间,创造具有k%多个字符的新数组,对于一些常数k。

  • 旧数组复制到新阵列,孤儿旧数组。

K%通常为100%;如果是那么这就是所谓的双满时的策略。

k% is usually 100%; if it is then this is called the "double when full" strategy.

这策略有它的摊销的很好的特性成本为O(n)。假设最后一遍字符串是一个万字,你开始万千。你在1000,2000,4000,8000,复印......并最终复制1000 + 2000 + 4000 + 8000 ... + 512000字符,总结复制到一百万字;好多了。

This strategy has the nice property that its amortized cost is O(n). Suppose again the final string is a million characters and you start with a thousand. You make copies at 1000, 2000, 4000, 8000, ... and end up copying 1000 + 2000 + 4000 + 8000 ... + 512000 characters, which sums to about a million characters copied; much better.

该战略的摊余成本是线性的财产的无论你选择什么样的百分比。

The strategy has the property that the amortized cost is linear no matter what percentage you choose.

这策略有很多缺点的那有时在复制操作是极其昂贵的的和的你可以浪费了未使用的内存为k,最后一个字符串长度的%的。

This strategy has a number of downside that sometimes a copy operation is extremely expensive, and you can be wasting up to k% of the final string length in unused memory.

第三个策略是使阵列的一个链表,大小为k的每个阵列。时溢出现有阵列,一个新的分配和附加到列表的末尾。

The third strategy is to make a linked list of arrays, each array of size k. When you overflow an existing array, a new one is allocated and appended to the end of the list.

这策略有很好的特性,没有操作特别昂贵,总内存浪费为k界,而你并不需要能够找到大块堆定期。它有不利的一面终于翻东西转换成字符串可能是昂贵的在链表中的阵列可能有局域性较差。

This strategy has the nice property that no operation is particularly expensive, the total wasted memory is bounded by k, and you don't need to be able to locate large blocks in the heap on a regular basis. It has the downside that finally turning the thing into a string can be expensive as the arrays in the linked list might have poor locality.

在习惯使用双时满战略.NET Framework的字符串生成器;它现在使用链表-的块策略。

The string builder in the .NET framework used to use a double-when-full strategy; it now uses a linked-list-of-blocks strategy.

这篇关于多少钱一个StringBuilder,像C模块中增加缓冲区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆