根据内存使用情况比较String和StringBuilder操作 [英] Comparison of String and StringBuilder manipulation in terms of memory usage

查看:126
本文介绍了根据内存使用情况比较String和StringBuilder操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据KathySierra的SCJP学习指南:


java.lang.StringBuffer和java.lang当你必须修改字符串时,.stringBuilder类应该使用

正如我们所讨论的,String对象是不可变的,因此如果你选择使用String对象进行大量的操作,你将在String池中获得大量
的废弃String对象


为了清除这一点,我已经完成了String类和StringBuilder的代码来源



String的simplfied代码如下所示:

  public final class String(){
private final char [] value; //我想,最终有助于不变性。
public String(String original){
value = original.value;
}
}

StringBuilder 的简化版看起来像这样:

  public final class StringBuilder {
char [] value;
public StringBuilder(String str){
value = new Char [(str.length()+ 16)]; // 16这里是依赖于实现的。
append(str);
}

public StringBuilder append(String str){

//如果大小允许,在值数组中添加'str'字符,
else
//创建大小为newCapacity的新数组,并复制'value'的内容。
// value = Arrays.copyOf(value,newCapacity); //这里旧的数组对象丢失了。

返回此;
}
}

所以我们假设我们有一个案例如下: / p>

使用String类:

  String s1 =abc; //在String池上创建一个对象。 
s1 = s1 +def; //创建两个对象 - 字符串池
//上的def和堆上的abcdef。

如果我使用StringBuilder,则代码变为:

  StringBuilder s1 = StringBuilder(abc); 

//在字符串池上创建一个String对象abc。
//堆上有一个StringBuilder对象abc。
s1.append(def);
//在字符串池上创建一个字符串对象def。
//并更改StringBuilder中的char []以保持def。

在StringBuilder中 s2 = s1.append(def); 持有字符串的char数组有可能被新的char数组替换。旧数组现在引用较少,将被垃圾收集。



我的查询是:



使用简单的字符串连接和 StringBuilder append()方法,继续使用的 String 对象的数量String池是相同的。



根据上面列出的代码, StringBuilder 确实使用了更大的char数组首先, String 对象包含一个与它所持有的字符串长度相同的char数组。


  1. 如何使用 StringBuilder 比普通
    字符串类更有效率字符串操作?

  2. 并且 SCJP指南中给出的语句是错误的?


解决方案

关键是 expandCapacity 函数:

  void expandCapacity(int minimumCapacity){
int newCapacity =(value.l ength + 1)* 2; //这里的重要部分
if(newCapacity< 0){
newCapacity = Integer.MAX_VALUE;
}否则if(minimumCapacity> newCapacity){
newCapacity = minimumCapacity;
}
value = Arrays.copyOf(value,newCapacity);
}

每次需要调整大小时,通过将基础数组的大小调整为2倍,追加1个字符所需的摊销时间最小化。



维基百科有一个很好的解释:


当插入n个元素时,容量形成几何级数。以任何恒定比例扩展数组可确保插入n个元素总体上花费O(n)时间,这意味着每个插入都需要按时间分摊。该比例a的值导致时空权衡:每次插入操作的平均时间约为a /(a-1),而浪费的细胞数量高于(a-1)n。 a的选择取决于库或应用程序:一些教科书使用a = 2,但Java的ArrayList实现使用a = 3/2而Python的列表数据结构的C实现使用a = 9/8。



如果大小低于某个阈值(例如容量的30%),许多动态数组也会释放一些底层存储。该阈值必须严格小于1 / a,以支持插入和移除的混合序列,并按摊销不变成本。



动态数组是教授摊销分析的常见示例。


现在,对于你的特定例子,它不会产生任何影响,但你会在附加大量字符时看到效果:

  public static void main(String [] args){
int numAppends = 200000;
int numRepetitions = 3;
long [] time1 = new long [numRepetitions];
long [] time2 = new long [numRepetitions];
现在很久;
for(int k = 0; k< numRepetitions; k ++){
String s =;
now = System.nanoTime();
for(int i = 0; i< numAppends; i ++){
s = s +a;
}
time1 [k] =(System.nanoTime() - now)/ 1000000;
StringBuilder sb = new StringBuilder();
now = System.nanoTime();
for(int i = 0; i< numAppends; i ++){
sb.append(a);
}
time2 [k] =(System.nanoTime() - now)/ 1000000;
System.out.println(Rep+ k +,time1:+ time1 [k] +ms,time2:+ time2 [k] +ms);
}
}

输出:

  Rep 0,time1:13509 ms,time2:7 ms 
Rep 1,time1:13086 ms,time2:1 ms
Rep 2, time1:13167 ms,time2:1 ms

另外,我计算了<$ c的次数$ c> Arrays.copyOf()在 extendCapacity()内为基准调用方法:在第一次迭代时它是49次,但在第二次和第三次迭代中只有15次。输出如下:

  newCapacity:34 
newCapacity:70
newCapacity:142
newCapacity:286
newCapacity:574
newCapacity:1150
newCapacity:2302
newCapacity:4606
newCapacity:9214
newCapacity:18430
newCapacity:36862
newCapacity:73726
newCapacity:147454
newCapacity:294910
newCapacity:42
Rep 2,time1:12955 ms,time2:134 ms


According to SCJP Study Guide by KathySierra:

The java.lang.StringBuffer and java.lang.StringBuilder classes should be used when you have to make modifications to strings of characters. As we discussed, String objects are immutable, so if you choose to do a lot of manipulations with String objects, you will end up with a lot of abandoned String objects in the String pool

To clear out this, I have gone through the code of String class and StringBuilder source here.

The simplfied code of String looks like this:

public final class String(){
     private final char [] value; //Final helps in immutability, I guess.
     public String (String original){
         value = original.value;
      }
}

And StringBuilder's simplified version look like this:

public final class StringBuilder{
    char [] value;
    public StringBuilder(String str) {
        value = new Char[(str.length() + 16)]; // 16 here is implementation dependent.
    append(str);
}

public StringBuilder append(String str){

            //Add 'str' characters in value array if its size allows,
        else
            // Create new array of size newCapacity and copy contents of 'value' in that.
            //value = Arrays.copyOf(value, newCapacity);// here old array object is lost.

        return this;
    }
}

So let's say we have a case as under:

Using String class:

String s1 = "abc"; // Creates one object on String pool.
s1 = s1+"def"; // Creates two objects - "def " on String pool
// and "abcdef" on the heap.

If I use StringBuilder, the code becomes:

 StringBuilder s1 = StringBuilder("abc");

 // Creates one String object "abc " on String pool.
 // And one StringBuilder object "abc" on the heap.
 s1.append("def");
 // Creates one string object "def" on String pool.
 // And changes the char [] inside StringBuilder to hold "def" also.

In StringBuilder s2 = s1.append("def"); there are equal chances that the char array holding the string will be replaced by a new char array. The old array is reference less now and will be garbage collected.

My Query is:

Using simple String concatenation and StringBuilder append() method, the number of String objects that go on to the String pool is same.

And according to the code listed above, StringBuilder does make use of bigger char arrays in the first place while the String object contains a char array of the same length as the string it is holding.

  1. How is the usage of StringBuilder more efficient than normal String class for string manipulations?
  2. And is the statement given in SCJP Guide wrong?

解决方案

The key is the expandCapacity function:

void expandCapacity(int minimumCapacity) {
    int newCapacity = (value.length + 1) * 2; //important part here
    if (newCapacity < 0) {
        newCapacity = Integer.MAX_VALUE;
    } else if (minimumCapacity > newCapacity) {
        newCapacity = minimumCapacity;
    }
    value = Arrays.copyOf(value, newCapacity);
}

By resizing the underlying array by a factor of 2 every time a resize is needed, the amortized time needed to append 1 character is minimized.

Wikipedia has a good explanation:

As n elements are inserted, the capacities form a geometric progression. Expanding the array by any constant proportion ensures that inserting n elements takes O(n) time overall, meaning that each insertion takes amortized constant time. The value of this proportion a leads to a time-space tradeoff: the average time per insertion operation is about a/(a−1), while the number of wasted cells is bounded above by (a−1)n. The choice of a depends on the library or application: some textbooks use a = 2, but Java's ArrayList implementation uses a = 3/2 and the C implementation of Python's list data structure uses a = 9/8.

Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity. This threshold must be strictly smaller than 1/a in order to support mixed sequences of insertions and removals with amortized constant cost.

Dynamic arrays are a common example when teaching amortized analysis.

Now for your particular example it would not make a difference, but you will see the effects when appending lots of characters:

public static void main(String[] args){
    int numAppends = 200000;
    int numRepetitions = 3;
    long[] time1 = new long[numRepetitions];
    long[] time2 = new long[numRepetitions];
    long now;
    for (int k = 0; k < numRepetitions; k++){
        String s = "";
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            s = s + "a";
        }
        time1[k] = (System.nanoTime() - now) / 1000000;
        StringBuilder sb = new StringBuilder();
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            sb.append("a");     
        }
        time2[k] = (System.nanoTime() - now) / 1000000;
        System.out.println("Rep "+k+", time1: "+time1[k]+ " ms, time2: " + time2[k] + " ms");
    }
}

Output:

Rep 0, time1: 13509 ms, time2: 7 ms
Rep 1, time1: 13086 ms, time2: 1 ms
Rep 2, time1: 13167 ms, time2: 1 ms

Also, I counted the number of times the Arrays.copyOf() method is called inside extendCapacity() for the benchmark: It's 49 times on the first iteration, but only 15 times on the second and third iterations. The output is as follows:

newCapacity: 34
newCapacity: 70
newCapacity: 142
newCapacity: 286
newCapacity: 574
newCapacity: 1150
newCapacity: 2302
newCapacity: 4606
newCapacity: 9214
newCapacity: 18430
newCapacity: 36862
newCapacity: 73726
newCapacity: 147454
newCapacity: 294910
newCapacity: 42
Rep 2, time1: 12955 ms, time2: 134 ms

这篇关于根据内存使用情况比较String和StringBuilder操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆