搜索字符串实习的成本和文字字符串的声明 [英] Search cost of string interning and declaration of literal strings

查看:139
本文介绍了搜索字符串实习的成本和文字字符串的声明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

两个问题。


  1. 当我们声明文字字符串时,我们搜索堆的字符串池中是否有相同的字符串。 这也是一个实习生(类 String 的方法实习生)?

  1. When we declare literal strings, we search whether there is the same string in string pool of heap. Is this also an interning (method intern of class String)?

在我看来,每个文字字符串声明都需要二进制搜索,所以当 n 是池中现有字符串的数量时,它至少要花费 log(n)。如果池中有许多字符串,则可能成本很高。 (可能需要权衡搜索成本和内存?)从这个角度来看,声明mant文字字符串可能很危险。
搜索成本的重要性以及为什么以这种方式设计java(在声明文字字符串时搜索池)。

In my thought, each literal string declaration needs a binary search or something so it costs at least log(n) when n is number of existing strings in the pool. And if there are many strings in the pool, it may be high cost. (maybe tradeoff of searching cost and memory?) On this point of view, it might be dangerous to declare mant literal strings. How significant is this searching cost and why java is designed in this way(searching pool when literal strings are declared).

以下是我所提到的背景知识。

Following is what I referred to understand background.

java.lang.String的JavaDoc class 状态:

The JavaDoc for the java.lang.String class states:


字符串是常量;它们的值在创建后无法更改。字符串缓冲区支持可变字符串。因为String对象是不可变的,所以可以共享它们。

Strings are constant; their values cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared.

http://www.janeg.ca/scjp/lang/strLiteral.html 评论:


换句话说,因为编译器知道字符串原始值一旦创建就无法更改,它可以安全地使用现有数据并避免使用重复文件混乱内存。

In other words, because the compiler knows the strings original value cannot be changed once it's created it can safely use existing data and avoid cluttering up memory with duplicates.


推荐答案

您将编译时复杂性与运行时复杂性混淆。

You're confusing compile time complexity with runtime complexity.

当加载类时,是的它会搜索到看看每个文字是否已经存在(虽然我想它会使用哈希图进行O(1)查找而不是你的提议)。

When the class is loaded, yes it does a search to see if each literal already exists (although I imagine it would use a hashmap for O(1) lookup instead of your proposal).

当代码运行时,它有引用内存中的字符串,因此没有额外的成本而不是非文字。

When the code runs, it has the reference to the string in memory so there is no additional cost than a non-literal.

所以是的,文字被实习。根据Javadoc for String,

So yes, literals are interned. According to the Javadoc for String,


字符串池(最初为空)由String类私有维护。

A pool of strings, initially empty, is maintained privately by the class String.

您可以在String上调用 intern()将其添加到此池。从逻辑上讲,如果 a.equals(b)那么 a.intern()== b.intern() ,因为 .intern()保证从唯一池返回。

You can invoke intern() on a String to add it to this pool. It logically follows that if a.equals(b) then a.intern() == b.intern(), since .intern() guarantees to return from the a unique pool.

示例:

class InternTest {
    // assuming InternTest is the only class, internPool.size = 0
    String x = "ABC"; // interned at class load, internPool.size = 1
    String y = "DEF"; // interned at class load, internPool.size = 2
    String z = "ABC"; // interned at class load, but match found - size = 2 still

    void foo() {
        // random int is just a mechanism to get something that I know won't
        // be interned at loadtime - could have loaded from file or database too
        int i = (new java.util.Random()).nextInt(1000) + 100;
        int j = i;
        String s = String.valueOf(i); // not yet interned, size = 2 still
        String t = String.valueOf(j); // not yet interned, size = 2 still

        String sIntern = s.intern(); // manually interned, size = 3 now
        String tIntern = t.intern(); // manually interned, match found, size = 3 still

        System.out.println("equals: " + (s.equals(t))); // should be true
        System.out.println("== raw: " + (s == t)); // should be false, different variables
        System.out.println("== int: " + (sIntern == tIntern)); // should be true, from unique pool

       System.out.println("x and z: " + (x == z)); // should be true, interned at class load
    }

    public static void main(String[] args) {
        (new InternTest()).foo();
    }

}

运行时的结果:

C:\Documents and Settings\glowcoder\My Documents>java InternTest
equals: true
== raw: false
== int: true
x and z: true

我应该指出,这种假设永远不会成真。 Java语言本身有许多 String s,在我们的 String s有机会看到光之前会被实现一天但是,假设所有内容都按顺序加载,如果你只考虑被绑定的字符串的增量,并假设没有与现有实习生发生冲突(我们都知道实习生可以挑剔并充满戏剧性,对吗? snicker )那么数字的确表示字符串池大小的增量。

I should point out that the assumption will never be true. The Java language itself has many Strings that would be interned before our Strings ever got the chance to see the light of day. However, assuming everything is loaded sequentially, if you consider only the delta of Strings being interned, and assume no collisions with existing interns (we all know interns can be fussy and full of drama, right? snicker) then the numbers do indeed indicate the delta of the size of the string pool.

这篇关于搜索字符串实习的成本和文字字符串的声明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆