java中初始化字符串的大小 [英] Size of Initialisation string in java

查看:44
本文介绍了java中初始化字符串的大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

显然,javac 中初始化字符串的大小是有限制的.谁能帮我确定最大限制是多少?

Apparently there is a limit to the size of an initialisation string in javac. Can anyone help me in identifying what the maximum limit is please?

谢谢

我们正在构建一个初始化字符串,它看起来像这样{1,2,3,4,5,6,7,8......}",但理想情况下有 10,000 个数字.当我们对 1000 执行此操作时,它会起作用,10,000 会抛出一个错误,说代码对于 try 语句来说太大了.

We are building an initialisation string which will look something like this "{1,2,3,4,5,6,7,8......}" but with 10,000 numbers ideally. When we do this for a 1000 it works, 10,000 throws an error saying code too large for try statement.

为了产生这个,我们使用了一个字符串构建器并循环附加值的数组.显然这是javac中的一个限制.我们被告知,如果我们以小块的形式传递它,我们可以在我们正在调用的方法中重建数组.然而这是不可能的,因为我们无法控制我们正在调用的用户方法.

To produce this we are using a stringbuilder and looping over an array appending the values. Apparently it is a limitation in javac. We have been told that we could rebuild the array in the method we are invoking if we pass it in small chunks. This however is not possible because we dont have control over the user method we are invoking.

我想发布代码但不能,因为这是大学的项目.我不是在寻找代码解决方案,只是帮助理解这里的实际问题.

I would like to post code but can't because this is a project for University. I am not looking for code solutions just some help in understanding what the actual problem here is.

它的for循环是违规者

Its the for loop which is the offender

    Object o = new Object() 
    { 
        public String toString() 
        { 
            StringBuilder s = new StringBuilder();
            int length = MainInterfaceProcessor.this.valuesFromData.length;
            Object[] arrayToProcess = MainInterfaceProcessor.this.valuesFromData;

            if(length == 0)
            {
                //throw exception to do
            }
            else if(length == 1)
            {
                s.append("{" + Integer.toString((Integer)arrayToProcess[0])+"}");
            }
            else
            {
                s.append("{" + Integer.toString((Integer)arrayToProcess[0])+","); //opening statement
                for(int i = 1; i < length; i++)
                {
                    if(i == (length - 1))
                    {
                        //last element in the array so dont add comma at the end
                        s.append(getArrayItemAsString(arrayToProcess, i)+"}");
                        break;
                    }       
                    //append each array value at position i, followed
                    //by a comma to seperate the values
                    s.append(getArrayItemAsString(arrayToProcess, i)+ ",");
                }
            }
            return s.toString();
        }
    };
    try 
    {
        Object result = method.invoke(obj, new Object[] { o });

}

推荐答案

字符串字面量(即 "...")的长度受类文件格式的 CONSTANT_Utf8_info 结构,引用通过 CONSTANT_String_info 结构.

The length of a String literal (i.e. "...") is limited by the class file format's CONSTANT_Utf8_info structure, which is referred by the CONSTANT_String_info structure.

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}

这里的限制因素是 length 属性,它只有 2 个字节大,即最大值为 65535.这个数字对应于字符串的修改后的 UTF-8 表示中的字节数(这实际上几乎是 CESU-8,但是0字符也是用两字节的形式表示的).

The limiting factor here is the length attribute, which only is 2 bytes large, i.e. has a maximum value of 65535. This number corresponds to the number of bytes in a modified UTF-8 representation of the string (this is actually almost CESU-8, but the 0 character is also represented in a two-byte form).

因此,纯 ASCII 字符串文字最多可以有 65535 个字符,而由 U+0800 ...U+FFFF 范围内的字符组成的字符串只有其中的三分之一.而在 UTF-8 中编码为代理对的那些(即 U+10000 到 U+10FFFF)每个占用 6 个字节.

So, a pure ASCII string literal can have up to 65535 characters, while a string consisting of characters in the range U+0800 ...U+FFFF have only one third of these. And the ones encoded as surrogate pairs in UTF-8 (i.e. U+10000 to U+10FFFF) take up 6 bytes each.

(标识符有相同的限制,即类、方法和变量名,以及它们的类型描述符,因为它们使用相同的结构.)

(The same limit is there for identifiers, i.e. class, method and variable names, and type descriptors for these, since they use the same structure.)

Java 语言规范没有提到对 字符串文字:

The Java Language Specification does not mention any limit for string literals:

字符串文字由用双引号括起来的零个或多个字符组成.

A string literal consists of zero or more characters enclosed in double quotes.

因此,原则上编译器可以将较长的字符串文字拆分为多个 CONSTANT_String_info 结构,并在运行时通过串联(和 .intern()-ing结果).我不知道是否真的有任何编译器在这样做.

So in principle a compiler could split a longer string literal into more than one CONSTANT_String_info structure and reconstruct it on runtime by concatenation (and .intern()-ing the result). I have no idea if any compiler is actually doing this.

这表明问题与字符串文字无关,而是与数组初始值设定项有关.

It shows that the problem does not relate to string literals, but to array initializers.

将对象传递给 BMethod.invoke(与 BConstructor.newInstance 类似),它可以是一个 BObject(即一个现有对象的包装器,然后它会传递包装的对象),一个字符串(将按原样传递),或其他任何东西.在最后一种情况下,对象将被转换为字符串(通过 toString()),然后将该字符串解释为 Java 表达式.

When passing an object to BMethod.invoke (and similarly to BConstructor.newInstance), it can either be a BObject (i.e. a wrapper around an existing object, it will then pass the wrapped object), a String (which will be passed as is), or anything else. In the last case, the object will be converted to a string (by toString()), and this string then interpreted as a Java expression.

为此,BlueJ 将这个表达式包装在一个类/方法中并编译这个方法.在该方法中,数组初始化器被简单地转换为一长串数组赋值......这最终使该方法比 Java 方法的最大字节码大小:

To do this, BlueJ will wrap this expression in a class/method and compile this method. In the method, the array initializer is simply converted to a long list of array assignments ... and this finally makes the method longer than the maximum bytecode size of a Java method:

code_length 项的值必须小于 65536.

The value of the code_length item must be less than 65536.

这就是为什么它在较长的数组中会中断.

This is why it breaks for longer arrays.

因此,要传递更大的数组,我们必须找到其他方法将它们传递给 BMethod.invoke.BlueJ 扩展 API 无法创建或访问封装在 BObject 中的数组.

So, to pass larger arrays, we have to find some other way to pass them to BMethod.invoke. The BlueJ extension API has no way to create or access arrays wrapped in a BObject.

我们在聊天中发现的一个想法是:

One idea we found in chat is this:

  1. 在项目内部(或在新项目中,如果它们可以互操作)创建一个新类,如下所示:

  1. Create a new class inside the project (or in a new project, if they can interoperate), something like this:

public class IntArrayBuilder {
    private ArrayList<Integer> list;
    public void addElement(int el) {
        list.add(el);
    }
    public int[] makeArray() {
        int[] array = new int[list.size()];
        for(int i = 0; i < array.length; i++) {
           array[i] = list.get(i);
        }
        return array;
    }
}

(这是为了创建一个 int[] - 如果你也需要其他类型的数组,它可以也变得更通用.此外,它可以通过使用内部 int[] 作为存储,随着它的增长偶尔扩大它,以及 int makeArray做最后的arraycopy.这是一个草图,因此这是最简单的实现.)

(This is for the case of creating an int[] - if you need other types of array, too, it can also be made more generic. Also, it could be made more efficient by using an internal int[] as storage, enlarging it sporadically as it grows, and int makeArray doing a final arraycopy. This is a sketch, thus this is the simplest implementation.)

从我们的扩展中,创建一个这个类的对象,并通过调用其 .addElement 方法向该对象添加元素.

From our extension, create an object of this class , and add elements to this object by calling its .addElement method.

BObject arrayToBArray(int[] a) {
    BClass builderClass = package.getClass("IntArrayBuilder");
    BObject builder = builderClass.getConstructor(new Class<?>[0]).newInstance(new Object[0]);
    BMethod addMethod = builderClass.getMethod("addElement", new Class<?>[]{int.class});
    for(int e : a) {
        addMethod.invoke(builder, new Object[]{ e });
    }
    BMethod makeMethod = builderClass.getMethod("addElement", new Class<?>[0]);
    BObject bArray = (BObject)makeMethod.invoke(builder, new Object[0]);
    return bArray;
}

(为了提高效率,BClass/BMethod 对象实际上可以检索一次并缓存,而不是每次数组转换时都缓存一次.)
如果您通过某种算法生成数组内容,您可以在此处进行此生成,而不是先创建另一个包装对象.

(For efficiency, the BClass/BMethod objects could actually be retrieved once and cached instead of once for each array conversion.)
If you generate the arrays contents by some algorithm, you can do this generation here instead of first creating another wrapping object.

在我们的扩展中,使用长数组调用我们实际想要调用的方法,传递我们包装好的数组:

In our extension, call the method we actually want to call with the long array, passing our wrapped array:

Object result = method.invoke(obj, new Object[] { bArray });

这篇关于java中初始化字符串的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆