什么时候在 Java 中使用享元字符串有益? [英] When is it beneficial to flyweight Strings in Java?

查看:15
本文介绍了什么时候在 Java 中使用享元字符串有益?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我了解 java 字符串实习的基本思想,但我试图弄清楚它发生在哪些情况下,并且我需要自己做蝇量级.

I understand the basic idea of java's String interning, but I'm trying to figure out which situations it happens in, and which I would need to do my own flyweighting.

有点相关:

  • Java Strings: "String s = new String("silly");"
  • The best alternative for String flyweight implementation in Java never quite got answered

他们一起告诉我 String s = "foo" 是好的,而 String s = new String("foo") 是坏的,但没有提到任何其他情况.

Together they tell me that String s = "foo" is good and String s = new String("foo") is bad but there's no mention of any other situations.

特别是,如果我解析一个包含大量重复值的文件(比如 csv),Java 的字符串实习会覆盖我还是我需要自己做些什么?我在我的 其他问题中得到了关于字符串实习是否适用的相互矛盾的建议

In particular, if I parse a file (say a csv) that has a lot of repeated values, will Java's string interning cover me or do I need to do something myself? I've gotten conflicting advice about whether or not String interning applies here in my other question

完整的答案有几个片段,所以我在这里总结一下:

The full answer came in several fragments, so I'll sum up here:

默认情况下,java 只实习编译时已知的字符串.String.intern(String) 可以在运行时使用,但它的性能不是很好,所以它只适用于你确定的较小数量的 String将重复很多.对于较大的字符串集,可以使用 Guava(参见 ColinD 的回答).

By default, java only interns strings that are known at compile-time. String.intern(String) can be used at runtime, but it doesn't perform very well, so it's only appropriate for smaller numbers of Strings that you're sure will be repeated a lot. For larger sets of Strings it's Guava to the rescue (see ColinD's answer).

推荐答案

不要在代码中使用 String.intern().至少如果您可能会得到 20 个或更多不同的字符串,则不会.根据我的经验,当您有几百万个字符串时,使用 String.intern 会减慢整个应用程序的速度.

Don't use String.intern() in your code. At least not if you might get 20 or more different strings. In my experience using String.intern slows down the whole application when you have a few millions strings.

为了避免重复的 String 对象,只需使用 HashMap.

To avoid duplicated String objects, just use a HashMap.

private final Map<String, String> pool = new HashMap<String, String>();

private void interned(String s) {
  String interned = pool.get(s);
  if (interned != null) {
    return interned;
  pool.put(s, s);
  return s;
}

private void readFile(CsvFile csvFile) {
  for (List<String> row : csvFile) {
    for (int i = 0; i < row.size(); i++) {
      row.set(i, interned(row.get(i)));
      // further process the row
    }
  }
  pool.clear(); // allow the garbage collector to clean up
}

使用该代码,您可以避免一个 CSV 文件出现重复的字符串.如果您需要更大规模地避免它们,请在另一个地方调用 pool.clear().

With that code you can avoid duplicate strings for one CSV file. If you need to avoid them on a larger scale, call pool.clear() in another place.

这篇关于什么时候在 Java 中使用享元字符串有益?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆