使用GSON JsonReader处理大型字段的最佳方法 [英] Best way to handle huge fields with GSON JsonReader

查看:1411
本文介绍了使用GSON JsonReader处理大型字段的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

即使使用GSON Streaming,我也会得到java.lang.OutOfMemoryError:Java堆空间。

  {result:OK,base64:JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC ....} 



base64最长可达200​​Mb。 (3GB)当我尝试将base64存储在一个变量中时,我得到了:

 线程main中的异常java.lang.OutOfMemoryError:Java堆空间$ java.util.Arrays.copyOf上的b $ b(Arrays.java:2367)$ b $ java.util.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java :130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)$ b $ at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
at java.lang。 StringBuilder.append(StringBuilder.java:204)
,位于com.google.gson.stream.JsonReader.nextQuotedValue(JsonReader.java:1014)
,位于com.google.gson.stream.JsonReader.nextString( JSONReader.java:815)

处理这类字段的最佳方法是什么?

解决方案

你为什么得到 OutOfMemoryError 的原因是GSON nextString()返回一个字符串,它在使用 StringBuilder 构建一个非常大的字符串期间被聚合。当你遇到这样的问题时,你必须处理中间数据,因为没有其他选择。不幸的是,GSON不允许你以任何方式处理大量文字。



不确定是否可以更改响应负载,但如果不能,您可能需要实现您自己的JSON阅读器,或者破解现有的 JsonReader ,使其以流媒体的方式工作。下面的例子基于GSON 2.5并且大量使用了反射,因为 JsonReader 非常小心地隐藏了它的状态。



EnhancedGson25JsonReader.java



final class EnhancedGson25JsonReader
extends JsonReader {

//侦听器接受内部字符缓冲区。
//接受构建在这样的缓冲区上的单个字符串也会浪费内存。
接口ISlicedStringListener {

void accept(char [] buffer,int start,int length)
throws IOException;

}

//常量可以被复制

/ ** @see JsonReader#PEEKED_NONE * /
private static final int PEEKED_NONE = 0;

/ ** @see JsonReader#PEEKED_SINGLE_QUOTED * /
private static final int PEEKED_SINGLE_QUOTED = 8;

/ ** @see JsonReader#PEEKED_DOUBLE_QUOTED * /
private static final int PEEKED_DOUBLE_QUOTED = 9;

//这里有一些间谍为父类的状态间谍

private final FieldSpy< Integer>偷看;
private final MethodSpy< Integer> doPeek;
private final MethodSpy< Integer> getLineNumber;
private final MethodSpy< Integer> getColumnNumber;
private final FieldSpy< char []>缓冲;
private final FieldSpy< Integer> POS机;
private final FieldSpy< Integer>限制;
private final MethodSpy< Character> readEscapeCharacter;
private final FieldSpy< Integer>电话号码;
private final FieldSpy< Integer> lineStart;
private final MethodSpy< Boolean> fillBuffer;
private final MethodSpy< IOException> SyntaxError错误;
private final FieldSpy< Integer>的stackSize;
private final FieldSpy< int []> pathIndices;

私人EnhancedJsonReader(最终读者阅读器)
抛出NoSuchFieldException,NoSuchMethodException {
super(阅读器);
peeked = spyField(JsonReader.class,this,peeked);
doPeek = spyMethod(JsonReader.class,this,doPeek);
getLineNumber = spyMethod(JsonReader.class,this,getLineNumber);
getColumnNumber = spyMethod(JsonReader.class,this,getColumnNumber);
buffer = spyField(JsonReader.class,this,buffer);
pos = spyField(JsonReader.class,this,pos);
limit = spyField(JsonReader.class,this,limit);
readEscapeCharacter = spyMethod(JsonReader.class,this,readEscapeCharacter);
lineNumber = spyField(JsonReader.class,this,lineNumber);
lineStart = spyField(JsonReader.class,this,lineStart);
fillBuffer = spyMethod(JsonReader.class,this,fillBuffer,int.class);
syntaxError = spyMethod(JsonReader.class,this,syntaxError,String.class);
stackSize = spyField(JsonReader.class,this,stackSize);
pathIndices = spyField(JsonReader.class,this,pathIndices);

$ b静态EnhancedJsonReader getEnhancedGson25JsonReader(最终读者阅读器){
尝试{
return new EnhancedJsonReader(reader);
catch(最终的NoSuchFieldException | NoSuchMethodException ex){
抛出新的RuntimeException(ex);



//这个方法已经从nextString()实现中复制并重写了
$ b void nextSlicedString(final ISlicedStringListener listener)
抛出IOException {
int p = peeked.get();
if(p == PEEKED_NONE){
p = doPeek.get();
}
switch(p){
case PEEKED_SINGLE_QUOTED:
nextQuotedSlicedValue('\'',listener);
休息;
case PEEKED_DOUBLE_QUOTED:
nextQuotedSlicedValue(''',listener);
break;
default:
throw new IllegalStateException(Expected a string but was+ peek( )
+at line+ getLineNumber.get()
+column+ getColumnNumber.get()
+path+ getPath()
);

peeked.accept(PEEKED_NONE);
pathIndices.get()[stackSize.get() - 1] ++;
}

//下面的方法也是为间谍修补的复制粘贴。
//原则上它与源代码相同,但它有一个更多的缓冲区singleCharBuffer
// in为了不添加另一个方法到ISlicedStringListener接口(尽可能地享受lamdbas)
//注意这两个方法的主要区别在于这个
//不会聚合一个字符串值,但只是将内部
//缓冲区委托给调用站点,因此后者可能对缓冲区执行任何操作。
$ b $ **
* @see JsonReader#nextQuotedValue(char)
* /
private void nextQuotedSlicedValue(final char quote,final ISlicedStringListener listener)
抛出IOException {
final char [] buffer = this.buffer.get();
final char [] singleCharBuffer = new char [1];
while(true){
int p = pos.get();
int l = limit.get();
int start = p;
while(p <1){
final int c = buffer [p ++];
if(c == quote){
pos.accept(p);
listener.accept(buffer,start,p - start - 1);
return;
} else if(c =='\\'){
pos.accept(p);
listener.accept(buffer,start,p - start - 1);
singleCharBuffer [0] = readEscapeCharacter.get();
listener.accept(singleCharBuffer,0,1);
p = pos.get();
l = limit.get();
start = p;
} else if(c =='\\\
'){
lineNumber.accept(lineNumber.get()+ 1);
lineStart.accept(p);
}
}
listener.accept(buffer,start,p - start);
pos.accept(p);
if(!fillBuffer.apply(just1)){
throw syntaxError.apply(justUnterminatedString);
}
}
}

//保存一些内存

private static final Object [] just1 = {1};
private static final Object [] justUnterminatedString = {Unterminated string};

}



FieldSpy.java



  final class FieldSpy< T> 
实现了Supplier< T>,Consumer< T> {

private final Object instance;
私人决赛场场;

私人FieldSpy(最终对象实例,最终字段字段){
this.instance = instance;
this.field = field;
}

static< T> FieldSpy< T> spyField(final Class<> declaringClass,final Object instance,final String fieldName)
throws NoSuchFieldException {
final Field field = declaringClass.getDeclaredField(fieldName);
field.setAccessible(true);
返回新的FieldSpy<>(instance,field);

$ b @Override
public T get(){
try {
@SuppressWarnings(unchecked)
final T value = (T)field.get(instance);
返回值;
} catch(final IllegalAccessException ex){
throw new RuntimeException(ex);
}
}

@Override
public void accept(final T value){
try {
field.set(instance,value );
} catch(final IllegalAccessException ex){
throw new RuntimeException(ex);



$ b MethodCy。 java

  final class MethodSpy< T> 
实现Function< Object [],T>,Supplier< T> {

private static final Object [] emptyObjectArray = {};

private final对象实例;
private final方法;

private MethodSpy(final Object实例,final方法){
this.instance = instance;
this.method = method;
}

static< T> MethodSpy< T> spyMethod(final Class<> declaringClass,final Object instance,final String methodName,final Class<> ... parameterTypes)
throws NoSuchMethodException {
final Method method = declaringClass.getDeclaredMethod(methodName, parameterTypes);
method.setAccessible(true);
返回新的MethodSpy<>(instance,method);

$ b @Override
public T get(){
//我的javac生成无用的新对象[0]如果没有参数传递
return apply (emptyObjectArray);

$ b @Override
public T apply(final Object [] arguments){
try {
@SuppressWarnings(unchecked)
final T value =(T)method.invoke(instance,arguments);
返回值;
catch(final IllegalAccessException | InvocationTargetException ex){
throw new RuntimeException(ex);

}

}



HugeJsonReaderDemo。 java



这里是一个演示,它使用该方法读取巨大的JSON并将其字符串值重定向到另一个文件。

public static void main(final String ... args)
throws IOException {
try(final EnhancedGson25JsonReader input = getEnhancedGson25JsonReader(new InputStreamReader(new FileInputStream(./ huge.json)));
final Writer output = new OutputStreamWriter(new BufferedOutputStream(new FileOutputStream(./ huge.json.STRINGS)))){
while(input.hasNext()){
final JsonToken token = input.peek();
switch(token){
case BEGIN_OBJECT:
input.beginObject();
休息;
case名称:
input.nextName();
休息;
case STRING:
input.nextSlicedString(output :: write);
休息;
默认值:
抛出新的AssertionError(token);
}
}
}
}

I成功地将上述字段提取到文件。输入文件长度为544 MB( 570 425 371 字节),由以下JSON块产生:


  • {result:OK,base64:

  • JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC (2 ^ 24)
  • } b


结果是(因为我只是将所有字符串重定向到文件中):


  • 确定

  • JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC × 16777216 (2 ^ 24)



我认为您遇到了一个非常有趣的问题。从GSON团队获得一些关于可能的API增强的反馈将很好。


I'm getting a java.lang.OutOfMemoryError: Java heap space even with GSON Streaming.

{"result":"OK","base64":"JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC...."}

base64 can be up to 200Mb long. GSON is taking much more memory than that, (3GB) When I try to store the base64 in a variable I get a:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2367)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
    at java.lang.StringBuilder.append(StringBuilder.java:204)
    at com.google.gson.stream.JsonReader.nextQuotedValue(JsonReader.java:1014)
    at com.google.gson.stream.JsonReader.nextString(JsonReader.java:815)

What is the best way to handle this kind of fields?

解决方案

The reason of why you're getting OutOfMemoryError is that GSON nextString() returns a string that's aggregated during building a very huge string using StringBuilder. When you're facing with such an issue, you have to deal with intermediate data since there is no other choice. Unfortunately, GSON does not let you to process huge literals in any way.

Not sure if you can change the response payload, but if you can't, you might want to implement your own JSON reader, or "hack" the existing JsonReader to make it work in streaming fashion. The example below is based on GSON 2.5 and makes heavy use of reflection because JsonReader hides its state very carefully.

EnhancedGson25JsonReader.java

final class EnhancedGson25JsonReader
        extends JsonReader {

    // A listener to accept the internal character buffers.
    // Accepting a single string built on such buffers is total memory waste as well.
    interface ISlicedStringListener {

        void accept(char[] buffer, int start, int length)
                throws IOException;

    }

    // The constants can be just copied

    /** @see JsonReader#PEEKED_NONE */
    private static final int PEEKED_NONE = 0;

    /** @see JsonReader#PEEKED_SINGLE_QUOTED */
    private static final int PEEKED_SINGLE_QUOTED = 8;

    /** @see JsonReader#PEEKED_DOUBLE_QUOTED */
    private static final int PEEKED_DOUBLE_QUOTED = 9;

    // Here is a bunch of spies made to "spy" for the parent's class state

    private final FieldSpy<Integer> peeked;
    private final MethodSpy<Integer> doPeek;
    private final MethodSpy<Integer> getLineNumber;
    private final MethodSpy<Integer> getColumnNumber;
    private final FieldSpy<char[]> buffer;
    private final FieldSpy<Integer> pos;
    private final FieldSpy<Integer> limit;
    private final MethodSpy<Character> readEscapeCharacter;
    private final FieldSpy<Integer> lineNumber;
    private final FieldSpy<Integer> lineStart;
    private final MethodSpy<Boolean> fillBuffer;
    private final MethodSpy<IOException> syntaxError;
    private final FieldSpy<Integer> stackSize;
    private final FieldSpy<int[]> pathIndices;

    private EnhancedJsonReader(final Reader reader)
            throws NoSuchFieldException, NoSuchMethodException {
        super(reader);
        peeked = spyField(JsonReader.class, this, "peeked");
        doPeek = spyMethod(JsonReader.class, this, "doPeek");
        getLineNumber = spyMethod(JsonReader.class, this, "getLineNumber");
        getColumnNumber = spyMethod(JsonReader.class, this, "getColumnNumber");
        buffer = spyField(JsonReader.class, this, "buffer");
        pos = spyField(JsonReader.class, this, "pos");
        limit = spyField(JsonReader.class, this, "limit");
        readEscapeCharacter = spyMethod(JsonReader.class, this, "readEscapeCharacter");
        lineNumber = spyField(JsonReader.class, this, "lineNumber");
        lineStart = spyField(JsonReader.class, this, "lineStart");
        fillBuffer = spyMethod(JsonReader.class, this, "fillBuffer", int.class);
        syntaxError = spyMethod(JsonReader.class, this, "syntaxError", String.class);
        stackSize = spyField(JsonReader.class, this, "stackSize");
        pathIndices = spyField(JsonReader.class, this, "pathIndices");
    }

    static EnhancedJsonReader getEnhancedGson25JsonReader(final Reader reader) {
        try {
            return new EnhancedJsonReader(reader);
        } catch ( final NoSuchFieldException | NoSuchMethodException ex ) {
            throw new RuntimeException(ex);
        }
    }

    // This method has been copied and reworked from the nextString() implementation

    void nextSlicedString(final ISlicedStringListener listener)
            throws IOException {
        int p = peeked.get();
        if ( p == PEEKED_NONE ) {
            p = doPeek.get();
        }
        switch ( p ) {
        case PEEKED_SINGLE_QUOTED:
            nextQuotedSlicedValue('\'', listener);
            break;
        case PEEKED_DOUBLE_QUOTED:
            nextQuotedSlicedValue('"', listener);
            break;
        default:
            throw new IllegalStateException("Expected a string but was " + peek()
                    + " at line " + getLineNumber.get()
                    + " column " + getColumnNumber.get()
                    + " path " + getPath()
            );
        }
        peeked.accept(PEEKED_NONE);
        pathIndices.get()[stackSize.get() - 1]++;
    }

    // The following method is also a copy-paste that was patched for the "spies".
    // It's, in principle, the same as the source one, but it has one more buffer singleCharBuffer
    // in order not to add another method to the ISlicedStringListener interface (enjoy lamdbas as much as possible).
    // Note that the main difference between these two methods is that this one
    // does not aggregate a single string value, but just delegates the internal
    // buffers to call-sites, so the latter ones might do anything with the buffers.

    /**
     * @see JsonReader#nextQuotedValue(char)
     */
    private void nextQuotedSlicedValue(final char quote, final ISlicedStringListener listener)
            throws IOException {
        final char[] buffer = this.buffer.get();
        final char[] singleCharBuffer = new char[1];
        while ( true ) {
            int p = pos.get();
            int l = limit.get();
            int start = p;
            while ( p < l ) {
                final int c = buffer[p++];
                if ( c == quote ) {
                    pos.accept(p);
                    listener.accept(buffer, start, p - start - 1);
                    return;
                } else if ( c == '\\' ) {
                    pos.accept(p);
                    listener.accept(buffer, start, p - start - 1);
                    singleCharBuffer[0] = readEscapeCharacter.get();
                    listener.accept(singleCharBuffer, 0, 1);
                    p = pos.get();
                    l = limit.get();
                    start = p;
                } else if ( c == '\n' ) {
                    lineNumber.accept(lineNumber.get() + 1);
                    lineStart.accept(p);
                }
            }
            listener.accept(buffer, start, p - start);
            pos.accept(p);
            if ( !fillBuffer.apply(just1) ) {
                throw syntaxError.apply(justUnterminatedString);
            }
        }
    }

    // Save some memory

    private static final Object[] just1 = { 1 };
    private static final Object[] justUnterminatedString = { "Unterminated string" };

}

FieldSpy.java

final class FieldSpy<T>
        implements Supplier<T>, Consumer<T> {

    private final Object instance;
    private final Field field;

    private FieldSpy(final Object instance, final Field field) {
        this.instance = instance;
        this.field = field;
    }

    static <T> FieldSpy<T> spyField(final Class<?> declaringClass, final Object instance, final String fieldName)
            throws NoSuchFieldException {
        final Field field = declaringClass.getDeclaredField(fieldName);
        field.setAccessible(true);
        return new FieldSpy<>(instance, field);
    }

    @Override
    public T get() {
        try {
            @SuppressWarnings("unchecked")
            final T value = (T) field.get(instance);
            return value;
        } catch ( final IllegalAccessException ex ) {
            throw new RuntimeException(ex);
        }
    }

    @Override
    public void accept(final T value) {
        try {
            field.set(instance, value);
        } catch ( final IllegalAccessException ex ) {
            throw new RuntimeException(ex);
        }
    }

}

MethodSpy.java

final class MethodSpy<T>
        implements Function<Object[], T>, Supplier<T> {

    private static final Object[] emptyObjectArray = {};

    private final Object instance;
    private final Method method;

    private MethodSpy(final Object instance, final Method method) {
        this.instance = instance;
        this.method = method;
    }

    static <T> MethodSpy<T> spyMethod(final Class<?> declaringClass, final Object instance, final String methodName, final Class<?>... parameterTypes)
            throws NoSuchMethodException {
        final Method method = declaringClass.getDeclaredMethod(methodName, parameterTypes);
        method.setAccessible(true);
        return new MethodSpy<>(instance, method);
    }

    @Override
    public T get() {
    // my javac generates useless new Object[0] if no args passed
        return apply(emptyObjectArray);
    }

    @Override
    public T apply(final Object[] arguments) {
        try {
            @SuppressWarnings("unchecked")
            final T value = (T) method.invoke(instance, arguments);
            return value;
        } catch ( final IllegalAccessException | InvocationTargetException ex ) {
            throw new RuntimeException(ex);
        }
    }

}

HugeJsonReaderDemo.java

And here is a demo that uses that method to read a huge JSON and redirect its string values to a another file.

public static void main(final String... args)
        throws IOException {
    try ( final EnhancedGson25JsonReader input = getEnhancedGson25JsonReader(new InputStreamReader(new FileInputStream("./huge.json")));
          final Writer output = new OutputStreamWriter(new BufferedOutputStream(new FileOutputStream("./huge.json.STRINGS"))) ) {
        while ( input.hasNext() ) {
            final JsonToken token = input.peek();
            switch ( token ) {
            case BEGIN_OBJECT:
                input.beginObject();
                break;
            case NAME:
                input.nextName();
                break;
            case STRING:
                input.nextSlicedString(output::write);
                break;
            default:
                throw new AssertionError(token);
            }
        }
    }
}

I successfully extracted the fields above to a file. The input file was 544 MB (570 425 371 bytes) length and generated out of the following JSON chunks:

  • {"result":"OK","base64":"
  • JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC × 16777216 (2^24)
  • "}

And the result is (since I just redirect all strings to the file):

  • OK
  • JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC × 16777216 (2^24)

I think that you faced with a very interesting issue. It would be nice to have some feedback from the GSON team on possible API enhancement.

这篇关于使用GSON JsonReader处理大型字段的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆