java中的初始化字符串大小 [英] Size of Initialisation string in java

查看:87
本文介绍了java中的初始化字符串大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

显然,javac中的初始化字符串的大小有限制。



编辑:



我们正在构建一个初始化字符串,其格式如下:{1,2,3,4,5,6,7,8 ... ....},但10,000个数字理想。当我们这样做1000工作,10,000抛出一个错误说代码太大的try语句。



为了产生这个,我们使用一个字符串构造器,并循环遍历一个附加值的数组。显然这是javac中的一个限制。我们被告知,如果我们以小块传递它,我们可以在我们调用的方法中重建数组。这是不可能的,因为我们不能控制我们正在调用的用户方法。



我想发布代码,但不能,因为这是一个项目为大学。我不是寻找代码解决方案只是一些帮助,了解这里的实际问题是什么。



它的for循环是违规者

  Object o = new Object()
{
public String toString()
{
StringBuilder s = new StringBuilder();
int length = MainInterfaceProcessor.this.valuesFromData.length;
Object [] arrayToProcess = MainInterfaceProcessor.this.valuesFromData;

if(length == 0)
{
// throw exception to do
}
else if(length == 1)
{
s.append({+ Integer.toString((Integer)arrayToProcess [0])+});
}
else
{
s.append({+ Integer.toString((Integer)arrayToProcess [0])+,); // open statement
for(int i = 1; i {
if(i ==(length - 1))
{
//数组中的最后一个元素,所以不要在末尾添加逗号
s.append(getArrayItemAsString(arrayToProcess,i)+});
break;
}
//在位置i附加每个数组值,然后通过逗号分隔
s.append(getArrayItemAsString(arrayToProcess,i)+ );
}
}
return s.toString();
}
};
try
{
Object result = method.invoke(obj,new Object [] {o});

}

解决方案

字符串文字的长度(即...)受类文件格式 CONSTANT_Utf8_info 结构,由 CONSTANT_String_info 结构。

  CONSTANT_Utf8_info {
u1标记;
u2 length;
u1 bytes [length];
}

这里的限制因素是 length 属性,只有2个字节大,即最大值为65535.
此数字对应于字符串的修改的UTF-8表示中的字节数(这实际上几乎是 CESU-8 ,但是0字符也以双字节形式表示)。



因此,纯ASCII字符串文字最多可以有65535个字符,而由U + 0800 ... U + FFFF范围内的字符组成的字符串只有其中的三分之一。在UTF-8(即U + 10000到U + 10FFFF)中被编码为代理对的值分别占用6个字节。



,即类,方法和变量名以及这些的类型描述符,因为它们使用相同的结构。)



Java语言规范没有提及字符串文字


字符串文字由用双引号括起来的零个或多个字符组成。


原则上,编译器可以将更长的字符串文字拆分为多个 CONSTANT_String_info 结构,并在运行时通过级联(和 .intern ) - 结果)。我不知道任何编译器是否真的这样做。






这表明问题与字符串字面量无关,



将对象传递到 BMethod.invoke (类似于BConstructor。 newInstance),它可以是一个BObject(即一个现有对象的包装,然后将传递被包装的对象),一个String(它将被传递)或任何其他。在最后一种情况下,对象将被转换为一个字符串( toString()),然后这个字符串被解释为Java表达式。



为此,BlueJ将在一个类/方法中包装这个表达式并编译这个方法。在该方法中,数组初始化器简单地转换为一个长列表的数组赋值...,这最终使方法比最大字节码大小



这是为什么它会打破较长的数组。






因此,为了传递更大的数组,我们必须找到一些其他方法将它们传递给BMethod.invoke。 BlueJ扩展API无法创建或访问包裹在BObject中的数组。



我们在聊天中发现的一个想法是:


  1. 在项目中创建一个新类(或者在一个新项目中,如果他们可以互操作),类似这样:

      public class IntArrayBuilder {
    private ArrayList< Integer>列表;
    public void addElement(int el){
    list.add(el);
    }
    public int [] makeArray(){
    int [] array = new int [list.size()];
    for(int i = 0; i array [i] = list.get(i);
    }
    return array;
    }
    }

    (这是创建 int [] - 如果你需要其他类型的数组,也可以使
    更通用,也可以使用
    internal int [] 作为存储,随着它的增长而零星扩大,并且int makeArray
    做一个最终的arraycopy这是一个草图,因此这是最简单的实现。)


  2. 在我们的扩展中,创建一个这个类的对象
    并通过调用 .addElement 方法。

      BObject arrayToBArray(int [] a){
    BClass builderClass = package.getClass(IntArrayBuilder);
    BObject builder = builderClass.getConstructor(new Class<?> [0])。newInstance(new Object [0]);
    BMethod addMethod = builderClass.getMethod(addElement,new Class<?> [] {int.class});
    for(int e:a){
    addMethod.invoke(builder,new Object [] {e});
    }
    BMethod makeMethod = builderClass.getMethod(addElement,new Class<?> [0]);
    BObject bArray =(BObject)makeMethod.invoke(builder,new Object [0]);
    return bArray;
    }

    (为了效率,BClass / BMethod对象实际上可以被检索一次并缓存

    如果您通过某种算法生成数组内容,则可以在此执行此生成操作,而不是首先创建另一个包装对象。


  3. 在我们的扩展中,调用我们实际想要使用long数组调用的方法,传递我们的包数组:

      Object result = method.invoke(obj,new Object [] {bArray}); 



Apparently there is a limit to the size of an initialisation string in javac. Can anyone help me in identifying what the maximum limit is please?

Thank you

edit:

We are building an initialisation string which will look something like this "{1,2,3,4,5,6,7,8......}" but with 10,000 numbers ideally. When we do this for a 1000 it works, 10,000 throws an error saying code too large for try statement.

To produce this we are using a stringbuilder and looping over an array appending the values. Apparently it is a limitation in javac. We have been told that we could rebuild the array in the method we are invoking if we pass it in small chunks. This however is not possible because we dont have control over the user method we are invoking.

I would like to post code but can't because this is a project for University. I am not looking for code solutions just some help in understanding what the actual problem here is.

Its the for loop which is the offender

    Object o = new Object() 
    { 
        public String toString() 
        { 
            StringBuilder s = new StringBuilder();
            int length = MainInterfaceProcessor.this.valuesFromData.length;
            Object[] arrayToProcess = MainInterfaceProcessor.this.valuesFromData;

            if(length == 0)
            {
                //throw exception to do
            }
            else if(length == 1)
            {
                s.append("{" + Integer.toString((Integer)arrayToProcess[0])+"}");
            }
            else
            {
                s.append("{" + Integer.toString((Integer)arrayToProcess[0])+","); //opening statement
                for(int i = 1; i < length; i++)
                {
                    if(i == (length - 1))
                    {
                        //last element in the array so dont add comma at the end
                        s.append(getArrayItemAsString(arrayToProcess, i)+"}");
                        break;
                    }       
                    //append each array value at position i, followed
                    //by a comma to seperate the values
                    s.append(getArrayItemAsString(arrayToProcess, i)+ ",");
                }
            }
            return s.toString();
        }
    };
    try 
    {
        Object result = method.invoke(obj, new Object[] { o });

}

解决方案

The length of a String literal (i.e. "...") is limited by the class file format's CONSTANT_Utf8_info structure, which is referred by the CONSTANT_String_info structure.

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}

The limiting factor here is the length attribute, which only is 2 bytes large, i.e. has a maximum value of 65535. This number corresponds to the number of bytes in a modified UTF-8 representation of the string (this is actually almost CESU-8, but the 0 character is also represented in a two-byte form).

So, a pure ASCII string literal can have up to 65535 characters, while a string consisting of characters in the range U+0800 ...U+FFFF have only one third of these. And the ones encoded as surrogate pairs in UTF-8 (i.e. U+10000 to U+10FFFF) take up 6 bytes each.

(The same limit is there for identifiers, i.e. class, method and variable names, and type descriptors for these, since they use the same structure.)

The Java Language Specification does not mention any limit for string literals:

A string literal consists of zero or more characters enclosed in double quotes.

So in principle a compiler could split a longer string literal into more than one CONSTANT_String_info structure and reconstruct it on runtime by concatenation (and .intern()-ing the result). I have no idea if any compiler is actually doing this.


It shows that the problem does not relate to string literals, but to array initializers.

When passing an object to BMethod.invoke (and similarly to BConstructor.newInstance), it can either be a BObject (i.e. a wrapper around an existing object, it will then pass the wrapped object), a String (which will be passed as is), or anything else. In the last case, the object will be converted to a string (by toString()), and this string then interpreted as a Java expression.

To do this, BlueJ will wrap this expression in a class/method and compile this method. In the method, the array initializer is simply converted to a long list of array assignments ... and this finally makes the method longer than the maximum bytecode size of a Java method:

The value of the code_length item must be less than 65536.

This is why it breaks for longer arrays.


So, to pass larger arrays, we have to find some other way to pass them to BMethod.invoke. The BlueJ extension API has no way to create or access arrays wrapped in a BObject.

One idea we found in chat is this:

  1. Create a new class inside the project (or in a new project, if they can interoperate), something like this:

    public class IntArrayBuilder {
        private ArrayList<Integer> list;
        public void addElement(int el) {
            list.add(el);
        }
        public int[] makeArray() {
            int[] array = new int[list.size()];
            for(int i = 0; i < array.length; i++) {
               array[i] = list.get(i);
            }
            return array;
        }
    }
    

    (This is for the case of creating an int[] - if you need other types of array, too, it can also be made more generic. Also, it could be made more efficient by using an internal int[] as storage, enlarging it sporadically as it grows, and int makeArray doing a final arraycopy. This is a sketch, thus this is the simplest implementation.)

  2. From our extension, create an object of this class , and add elements to this object by calling its .addElement method.

    BObject arrayToBArray(int[] a) {
        BClass builderClass = package.getClass("IntArrayBuilder");
        BObject builder = builderClass.getConstructor(new Class<?>[0]).newInstance(new Object[0]);
        BMethod addMethod = builderClass.getMethod("addElement", new Class<?>[]{int.class});
        for(int e : a) {
            addMethod.invoke(builder, new Object[]{ e });
        }
        BMethod makeMethod = builderClass.getMethod("addElement", new Class<?>[0]);
        BObject bArray = (BObject)makeMethod.invoke(builder, new Object[0]);
        return bArray;
    }
    

    (For efficiency, the BClass/BMethod objects could actually be retrieved once and cached instead of once for each array conversion.)
    If you generate the arrays contents by some algorithm, you can do this generation here instead of first creating another wrapping object.

  3. In our extension, call the method we actually want to call with the long array, passing our wrapped array:

    Object result = method.invoke(obj, new Object[] { bArray });
    

这篇关于java中的初始化字符串大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆