在调用intern()方法后,内存中的新String()对象何时清除 [英] When will the new String() object in memory gets cleared after invoking intern() method

查看:166
本文介绍了在调用intern()方法后,内存中的新String()对象何时清除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

List<String> list = new ArrayList<>();
for (int i = 0; i < 1000; i++)
{
    StringBuilder sb = new StringBuilder();
    String string = sb.toString();
    string = string.intern()
    list.add(string);
}

在上面的示例中,在调用string.intern()方法之后,什么时候会堆中创建的1000个对象(sb.toString)是否被清除?

In the above sample, after invoking string.intern() method, when will the 1000 objects created in heap (sb.toString) be cleared?

编辑1:
如果有无法保证可以清除这些对象。假设GC没有运行,使用string.intern()本身是否过时了? (在内存使用方面?)

Edit 1: If there is no guarantee that these objects could be cleared. Assuming that GC haven't run, is it obsolete to use string.intern() itself? (In terms of the memory usage?)

在使用intern()方法时,有没有办法减少内存使用/对象创建? / p>

Is there any way to reduce memory usage / object creation while using intern() method?

推荐答案

你的例子有点奇怪,因为它创建了1000个空字符串。如果你想得到这样一个消耗最少内存的列表,你应该使用

Your example is a bit odd, as it creates 1000 empty strings. If you want to get such a list with consuming minimum memory, you should use

List<String> list = Collections.nCopies(1000, "");

而不是。

如果我们假设有一些更复杂的事情,而不是在每次迭代中创建相同的字符串,那么,调用 intern()没有任何好处。会发生什么,取决于实现。但是当对不在池中的字符串调用 intern()时,它将在最好的情况下添加到池中,但在最坏的情况下,另一个副本将被制作并添加到池中。

If we assume that there is something more sophisticated going on, not creating the same string in every iteration, well, then there is no benefit in calling intern(). What will happen, is implemen­tation dependent. But when calling intern() on a string that is not in the pool, it will be just added to the pool in the best case, but in the worst case, another copy will be made and added to the pool.

此时,我们还没有节省,但可能会产生额外的垃圾。

At this point, we have no savings yet, but potentially created additional garbage.

如果某处存在重复,此时实习只能为您节省一些内存。这意味着您首先构造重复的字符串,然后通过 intern()查找其规范实例,因此在内存中使用重复的字符串直到收集垃圾,这是不可避免的。但这不是实习的真正问题:

Interning at this point can only save you some memory, if there are duplicates somewhere. This implies that you construct duplicate strings first, to look up their canonical instance via intern() afterwards, so having the duplicate string in memory until garbage collected, is unavoidable. But that’s not the real problem with interning:


  • 在较旧的JVM中,对特制字符串进行了特殊处理,可能导致垃圾收集性能下降甚至耗尽资源(即固定大小的PermGen空间)。

  • 在HotSpot中,保存实习字符串的字符串池是固定大小的哈希表,产生哈希冲突,因此,当引用明显多于表大小的字符串时,性能很差。

    在Java 7之前,更新40,默认大小约为1,000,甚至不足以容纳任何没有哈希冲突的重要应用程序的所有字符串常量,更不用说手动添加字符串了。更高版本使用大约60,000的默认大小,这是更好的,但仍然是一个固定的大小,应该阻止你添加任意数量的字符串

  • 字符串池必须服从内部线程由语言规范强制执行的语义(因为它用于字符串文字),因此,需要执行可降低性能的线程安全更新

请记住,即使在没有重复的情况下,即使没有空间节省,您也要支付上述缺点的价格。此外,对规范字符串的获取引用必须具有比用于查找它的临时对象更长的生命周期,以对内存消耗产生任何积极影响。

Keep in mind that you pay the price of the disadvantages named above, even in the cases that there are no duplicates, i.e. there is no space saving. Also, the acquired reference to the canonical string has to have a much longer lifetime than the temporary object used to look it up, to have any positive effect on the memory consumption.

后者涉及你的字面问题。当垃圾收集器下次运行时,将回收临时实例,这将是实际需要内存的时间。没有必要担心何时会发生这种情况,但是,是的,到目前为止,获取规范参考没有任何积极影响,不仅因为内存还没有被重用到那一点,而且因为直到那时才真正需要内存。

The latter touches your literal question. The temporary instances are reclaimed when the garbage collector runs the next time, which will be when the memory is actually needed. There is no need to worry about when this will happen, but well, yes, up to that point, acquiring a canonical reference had no positive effect, not only because the memory hasn’t been reused up to that point, but also, because the memory was not actually needed until then.

这是提及新 字符串重复数据删除 功能。这不会更改字符串实例,即这些对象的标识,因为这会改变程序的语义,但更改相同的字符串以使用相同的 char [] 数组。由于这些字符数组是最大的有效负载,这仍然可以节省大量内存,而不会出现使用 intern()的性能缺点。由于此重复数据删除是由垃圾收集器完成的,因此它仅适用于存活时间足以产生差异的字符串。此外,这意味着当仍有足够的可用内存时,它不会浪费CPU周期。

This is the place to mention the new String Deduplication feature. This does not change string instances, i.e. the identity of these objects, as that would change the semantic of the program, but change identical strings to use the same char[] array. Since these character arrays are the biggest payload, this still may achieve great memory savings, without the performance disadvan­tages of using intern(). Since this deduplication is done by the garbage collector, it will only applied to strings that survived long enough to make a difference. Also, this implies that it will not waste CPU cycles when there still is plenty of free memory.

但是,有可能是手动规范化可能合理的情况。想象一下,我们正在解析源代码文件或XML文件,或者从外部源( Reader 或数据库)导入字符串,默认情况下不会发生这种规范化,但是重复可能以某种可能性发生。如果我们计划将数据保留更长时间进行进一步处理,我们可能希望摆脱重复的字符串实例。

However, there might be cases, where manual canonicalization might be justified. Imagine, we’re parsing a source code file or XML file, or importing strings from an external source (Reader or data base) where such canonicalization will not happen by default, but duplicates may occur with a certain likelihood. If we plan to keep the data for further processing for a longer time, we might want to get rid of duplicate string instances.

在这种情况下,最好的方法之一是使用本地映射,不受线程同步的影响,在进程之后删除它,以避免保留超过必要的引用,而不必使用与垃圾收集器的特殊交互。这意味着不同数据源中相同字符串的出现不是规范化的(但仍受JVM的 String Deduplication 的约束),但这是一个合理的权衡。通过使用普通的可调整大小的 HashMap ,我们也没有固定实习生表的问题。

In this case, one of the best approaches is to use a local map, not being subject to thread synchronization, dropping it after the process, to avoid keeping references longer than necessary, without having to use special interaction with the garbage collector. This implies that occurrences of the same strings within different data sources are not canonicalized (but still being subject to the JVM’s String Deduplication), but it’s a reasonable trade-off. By using an ordinary resizable HashMap, we also do not have the issues of the fixed intern table.

例如

static List<String> parse(CharSequence input) {
    List<String> result = new ArrayList<>();

    Matcher m = TOKEN_PATTERN.matcher(input);
    CharBuffer cb = CharBuffer.wrap(input);
    HashMap<CharSequence,String> cache = new HashMap<>();
    while(m.find()) {
        result.add(
            cache.computeIfAbsent(cb.subSequence(m.start(), m.end()), Object::toString));
    }
    return result;
}

注意使用 CharBuffer 这里:包装输入序列,其 subSequence 方法返回另一个包含不同开始和结束索引的包装器,实现正确的等于 hashCode 我们的 HashMap 的方法,以及 computeIfAbsent 只会调用 toString 方法,前提是该地图中没有该密钥。因此,与使用 intern()不同,不会为已经遇到的字符串创建 String 实例,从而节省了最昂贵的方面它,复制字符数组。

Note the use of the CharBuffer here: it wraps the input sequence and its subSequence method returns another wrapper with different start and end index, implementing the right equals and hashCode method for our HashMap, and computeIfAbsent will only invoke the toString method, if the key was not present in the map before. So, unlike using intern(), no String instance will be created for already encountered strings, saving the most expensive aspect of it, the copying of the character arrays.

如果我们有很高的重复可能性,我们甚至可以保存包装器实例的创建:

If we have a really high likelihood of duplicates, we may even save the creation of wrapper instances:

static List<String> parse(CharSequence input) {
    List<String> result = new ArrayList<>();

    Matcher m = TOKEN_PATTERN.matcher(input);
    CharBuffer cb = CharBuffer.wrap(input);
    HashMap<CharSequence,String> cache = new HashMap<>();
    while(m.find()) {
        cb.limit(m.end()).position(m.start());
        String s = cache.get(cb);
        if(s == null) {
            s = cb.toString();
            cache.put(CharBuffer.wrap(s), s);
        }
        result.add(s);
    }
    return result;
}

这为每个唯一字符串只创建一个包装器,但还必须执行一个放置时每个唯一字符串的哈希查找。由于包装器的创建非常便宜,因此您需要大量重复的字符串,即与总数相比较少的唯一字符串,才能从这种权衡中获益。

This creates only one wrapper per unique string, but also has to perform one additional hash lookup for each unique string when putting. Since the creation of a wrapper is quiet cheap, you really need a significantly large number of duplicate strings, i.e. small number of unique strings compared to the total number, to have a benefit from this trade-off.

如上所述,这些方法非常有效,因为它们使用的是纯粹的本地缓存,之后才会丢弃。有了这个,我们不必处理线程安全,也不必以特殊方式与JVM或垃圾收集器交互。

As said, these approaches are very efficient, because they use a purely local cache that is just dropped afterwards. With this, we don’t have to deal with thread safety nor interact with the JVM or garbage collector in a special way.

这篇关于在调用intern()方法后,内存中的新String()对象何时清除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆