如何检测String.substring是否复制字符数据 [英] How to detect whether String.substring copies the character data

查看:97
本文介绍了如何检测String.substring是否复制字符数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道对于Oracle Java 1.7 Update 6和更高版本,当使用String.substring时, 将复制String的内部字符数组,对于较旧的版本,将共享它. 但是我没有找到可以告诉我当前行为的官方API.

用例

我的用例是: 在解析器中,我想检测String.substring是复制还是共享基础字符数组. 问题是,如果共享字符数组,则我的解析器需要使用new String(s)明确地取消共享",以避免 内存问题.但是,如果无论如何String.substring复制数据,则没有必要,并且可以避免在解析器中显式复制数据.用例:

// possibly the query is very very large
String query = "select * from test ...";
// the identifier is used outside of the parser
String identifier = query.substring(14, 18);

// avoid if possible for speed,
// but needed if identifier internally 
// references the large query char array
identifier = new String(identifier);

我需要什么

基本上,我想有一个静态方法boolean isSubstringCopyingForSure()来检测是否不需要new String(..).如果存在SecurityManager,如果检测不起作用,我可以.基本上,检测应该是保守的(为了避免出现内存问题,我宁愿使用new String(..),即使没有必要).

选项

我有几种选择,但是我不确定它们是否可靠,特别是对于非Oracle JVM:

检查String.offset字段

/**
 * @return true if substring is copying, false if not or if it is not clear
 */
static boolean isSubstringCopyingForSure() {
    if (System.getSecurityManager() != null) {
        // we can not reliably check it
        return false;
    }
    try {
        for (Field f : String.class.getDeclaredFields()) {
            if ("offset".equals(f.getName())) {
                return false;
            }
        }
        return true;
    } catch (Exception e) {
        // weird, we do have a security manager?
    }
    return false;
}

检查JVM版本

static boolean isSubstringCopyingForSure() {
    // but what about non-Oracle JREs?
    return System.getProperty("java.vendor").startsWith("Oracle") &&
           System.getProperty("java.version").compareTo("1.7.0_45") >= 0;
}

检查行为 有两种选择,两者都很复杂.一种是使用自定义字符集创建一个字符串,然后使用子字符串创建一个新字符串b,然后 modify 原始字符串并检查b是否也被更改.第二个选项是创建巨大的字符串,然后创建几个子字符串,并检查内存使用情况.

解决方案

对,确实是7u6.对此没有API更改,因为此更改严格来说是实现更改,而不是API更改,也没有API来检测正在运行的JDK的行为.但是,由于更改,应用程序肯定有可能注意到性能或内存利用率的差异.实际上,编写在7u4中可以运行但在7u6中可以运行的程序并不困难,反之亦然.我们希望这种权衡对大多数应用程序有利,但是毫无疑问,某些应用程序会受到这种更改的影响.

有趣的是,您担心共享字符串值(在7u6之前)的情况.我听到的大多数人都有相反的担忧,他们喜欢"共享,而将7u6更改为未共享的值会给他们带来麻烦(或者,他们担心会引起问题).

无论如何,要做的是测量,而不是猜测!

首先,比较有无更改的类似JDK之间的应用程序性能,例如7u4和7u6.可能您应该查看GC日志或其他内存监视工具.如果差异可以接受,那么您就完成了!

假设7u6之前的共享字符串值引起问题,下一步是尝试new String(s.substring(...))的简单变通方法以强制取消共享字符串值.然后测量.同样,如果两个JDK的性能都可以接受,那么您就完成了!

如果事实证明在不共享的情况下,对new String()的额外调用是不可接受的,则可能最好的检测此情况并将取消共享"调用作为条件的方法是考虑String的value字段,它是char[],并获取其长度:

int getValueLength(String s) throws Exception {
    Field field = String.class.getDeclaredField("value");
    field.setAccessible(true);
    return ((char[])field.get(s)).length;
}

请考虑调用substring()所产生的字符串,该字符串返回比原始字符串短的字符串.在共享的情况下,子字符串的length()将不同于如上所示检索的value数组的长度.在非共享情况下,它们将是相同的.例如:

String s = "abcdefghij".substring(2, 5);
int logicalLength = s.length();
int valueLength = getValueLength(s);

System.out.printf("%d %d ", logicalLength, valueLength);
if (logicalLength != valueLength) {
    System.out.println("shared");
else
    System.out.println("unshared");

在7u6之前的JDK上,该值的长度将为10,而在7u6或更高版本上,该值的长度将为3.当然,在两种情况下,逻辑长度均为3.

I know that for Oracle Java 1.7 update 6 and newer, when using String.substring, the internal character array of the String is copied, and for older versions, it is shared. But I found no offical API that would tell me the current behavior.

Use Case

My use case is: In a parser, I like to detect whether String.substring copies or shares the underlying character array. The problem is, if the character array is shared, then my parser needs to explicitly "un-share" using new String(s) to avoid memory problems. However, if String.substring anyway copies the data, then this is not necessary, and explicitly copying the data in the parser could be avoided. Use case:

// possibly the query is very very large
String query = "select * from test ...";
// the identifier is used outside of the parser
String identifier = query.substring(14, 18);

// avoid if possible for speed,
// but needed if identifier internally 
// references the large query char array
identifier = new String(identifier);

What I Need

Basically, I would like to have a static method boolean isSubstringCopyingForSure() that would detect if new String(..) is not needed. I'm OK if detection doesn't work if there is a SecurityManager. Basically, the detection should be conservative (to avoid memory problems, I'd rather use new String(..) even if not necessary).

Options

I have a few options, but I'm not sure if they are reliable, specially for non-Oracle JVMs:

Checking for the String.offset field

/**
 * @return true if substring is copying, false if not or if it is not clear
 */
static boolean isSubstringCopyingForSure() {
    if (System.getSecurityManager() != null) {
        // we can not reliably check it
        return false;
    }
    try {
        for (Field f : String.class.getDeclaredFields()) {
            if ("offset".equals(f.getName())) {
                return false;
            }
        }
        return true;
    } catch (Exception e) {
        // weird, we do have a security manager?
    }
    return false;
}

Checking the JVM version

static boolean isSubstringCopyingForSure() {
    // but what about non-Oracle JREs?
    return System.getProperty("java.vendor").startsWith("Oracle") &&
           System.getProperty("java.version").compareTo("1.7.0_45") >= 0;
}

Checking the behavior There are two options, both are rather complicated. One is create a string using custom charset, then create a new string b using substring, then modify the original string and check whether b is also changed. The second options is create huge string, then a few substrings, and check the memory usage.

解决方案

Right, indeed this change was made in 7u6. There is no API change for this, as this change is strictly an implementation change, not an API change, nor is there an API to detect which behavior the running JDK has. However, it is certainly possible for applications to notice a difference in performance or memory utilization because of the change. In fact, it's not difficult to write a program that works in 7u4 but fails in 7u6 and vice-versa. We expect that the tradeoff is favorable for the majority of applications, but undoubtedly there are applications that will suffer from this change.

It's interesting that you're concerned about the case where string values are shared (prior to 7u6). Most people I've heard from have the opposite concern, where they like the sharing and the 7u6 change to unshared values is causing them problems (or, they're afraid it will cause problems).

In any case the thing to do is measure, not guess!

First, compare the performance of your application between similar JDKs with and without the change, e.g. 7u4 and 7u6. Probably you should be looking at GC logs or other memory monitoring tools. If the difference is acceptable, you're done!

Assuming that the shared string values prior to 7u6 cause a problem, the next step is to try the simple workaround of new String(s.substring(...)) to force the string value to be unshared. Then measure that. Again, if the performance is acceptable on both JDKs, you're done!

If it turns out that in the unshared case, the extra call to new String() is unacceptable, then probably the best way to detect this case and make the "unsharing" call conditional is to reflect on a String's value field, which is a char[], and get its length:

int getValueLength(String s) throws Exception {
    Field field = String.class.getDeclaredField("value");
    field.setAccessible(true);
    return ((char[])field.get(s)).length;
}

Consider a string resulting from a call to substring() that returns a string shorter than the original. In the shared case, the substring's length() will differ from the length of the value array retrieved as shown above. In the unshared case, they'll be the same. For example:

String s = "abcdefghij".substring(2, 5);
int logicalLength = s.length();
int valueLength = getValueLength(s);

System.out.printf("%d %d ", logicalLength, valueLength);
if (logicalLength != valueLength) {
    System.out.println("shared");
else
    System.out.println("unshared");

On JDKs older than 7u6, the value's length will be 10, whereas on 7u6 or later, the value's length will be 3. In both cases, of course, the logical length will be 3.

这篇关于如何检测String.substring是否复制字符数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆