Java收集和内存优化 [英] Java collection and memory optimization

查看:87
本文介绍了Java收集和内存优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我向自定义表写了一个自定义索引,该表使用500MB的堆存储500k字符串.只有10%的字符串是唯一的;其余都是重复.每个字符串的长度为4.

I wrote a custom index to a custom table which uses 500MB of heap for 500k strings. Only 10% of the strings are unique; the rest are repeats. Every string is of length 4.

我如何优化代码?我应该使用其他收藏集吗?我试图实现一个自定义的字符串池以节省内存:

How i can optimize my code? Should I use another collection? I tried to implement a custom string pool to save memory:

public class StringPool {

    private static WeakHashMap<String, String> map = new WeakHashMap<>();

    public static String getString(String str) { 
        if (map.containsKey(str)) {
            return map.get(str);
        } else {
            map.put(str, str);
            return map.get(str);
        }
    }
}

private void buildIndex() {
        if (monitorModel.getMessageIndex() == null) {
            // the index, every columns create an index
            ArrayList<HashMap<String, TreeSet<Integer>>> messageIndex = new ArrayList<>(filterableColumn.length);
            for (int i = filterableColumn.length; i >= 0; i--) {
                // key -> string,   value -> treeset, the row wich contains the key
                HashMap<String, TreeSet<Integer>> hash = new HashMap<>();
                messageIndex.add(hash);
            }
            // create index for every column
            for (int i = monitorModel.getParser().getMyMessages().getMessages().size() - 1; i >= 0; --i) {
                TreeSet<Integer> tempList;

                for (int j = 0; j < filterableColumn.length; j++) {
                    String value  = StringPool.getString(getValueAt(i, j).toString());
                    if (!messageIndex.get(j).containsKey(value)) {
                        tempList = new TreeSet<>();
                        messageIndex.get(j).put(value, tempList);
                    } else {
                        tempList = messageIndex.get(j).get(value);
                    }

                    tempList.add(i);
                }
            }
            monitorModel.setMessageIndex(messageIndex);
        }
    }

推荐答案

您可能想在探查器中检查内存堆.我的猜测是,内存消耗不是主要在String存储中,而是在许多TreeSet<Integer>实例中.如果是这样,则可以通过使用原始数组(int[]short[]byte[],具体取决于要存储的整数值的实际大小)进行显着优化.或者,您可以查看原始收集类型,例如 FastUtil

You might want to examine your memory heap in a profiler. My guess is that the memory consumption isn't primarily in the String storage, but in the many TreeSet<Integer> instances. If so, you could optimize considerably by using primitive arrays (int[], short[], or byte[], depending on the actual size of the integer values you're storing). Or you could look into a primitive collection type, such as those provided by FastUtil or Trove.

如果确实发现String存储存在问题,我将假定您要将应用程序扩展到500k个String以上,或者特别严格的内存限制要求您甚至对短String进行重复数据删除.

If you do find that the String storage is problematic, I'll assume that you want to scale your application beyond 500k Strings, or that especially tight memory constraints require you to deduplicate even short Strings.

正如开发人员所说,String.intern()将为您删除字符串重复数据.但是,需要注意的是-在Oracle和OpenJDK虚拟机中,String.intern()会将那些字符串存储在VM永久代中,这样将来就不会对其进行垃圾回收了.如果满足以下条件,那将是适当的(并且很有帮助):

As Dev said, String.intern() will deduplicate Strings for you. One caveat, however - in the Oracle and OpenJDK virtual machines, String.intern() will store those Strings in the VM permanent-generation, such that they will not be garbage-collected in the future. That's appropriate (and helpful) if:

  1. 您存储的字符串在VM的整个生命周期内都不会更改(例如,如果您在启动时读入静态列表并在应用程序的整个生命周期内使用它).
  2. 您需要存储的Strings可以轻松地适合VM永久代(具有足够的空间供Classload和PermGen的其他使用方使用).更新:请参见下文.

如果这些条件中的任何一个为false,则构建一个定制池可能是正确的.但是我的建议是您考虑使用简单的HashMap代替当前使用的WeakHashMap.您可能不希望这些值在缓存中时被垃圾回收,并且WeakHashMap添加了另一个间接级别(以及相关的对象指针),从而进一步增加了内存消耗.

If either of those conditions is false, you are probably correct to build a custom pool. But my recommendation is that you consider a simple HashMap in place of the WeakHashMap you're currently using. You probably don't want these values to be garbage-collected while they're in your cache, and WeakHashMap adds another level of indirection (and the associated object pointers), increasing memory consumption further.

更新:有人告诉我,JDK 7像以前的JDK一样在主堆中而不是perm-gen中存储了Intern Strings(String.intern()).如果您使用的是JDK 7,这会使String.intern()的风险降低.

Update: I'm told that JDK 7 stores interned Strings (String.intern()) in the main heap, not in perm-gen, as earlier JDKs did. That makes String.intern() less risky if you're using JDK 7.

这篇关于Java收集和内存优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆