创建具有大(> 1GB)BytesWritable值大小的SequenceFile时的NegativeArraySizeException [英] NegativeArraySizeException when creating a SequenceFile with large (>1GB) BytesWritable value size

查看:223
本文介绍了创建具有大(> 1GB)BytesWritable值大小的SequenceFile时的NegativeArraySizeException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试了不同的方法,用
创建一个大的Hadoop SequenceFile,只需一个短(<100bytes)的键,但是一个大的(> 1GB)值(BytesWriteable)。

以下示例适用于开箱即用:



https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache /hadoop/mapred/BigMapOutput.java



可以写入多个随机长度的密钥和值,总大小> 3GB。

然而,这不是我想要做的。所以我使用hadoop 2.2.0 API对其进行了修改,如下所示:

  Path file = new Path(/ input); 
SequenceFile.Writer writer = SequenceFile.createWriter(conf,
SequenceFile.Writer.file(file),
SequenceFile.Writer.compression(CompressionType.NONE),
SequenceFile.Writer .keyClass(BytesWritable.class),
SequenceFile.Writer.valueClass(BytesWritable.class));
int numBytesToWrite = fileSizeInMB * 1024 * 1024;
BytesWritable randomKey = new BytesWritable();
BytesWritable randomValue = new BytesWritable();
randomKey.setSize(1);
randomValue.setSize(numBytesToWrite);
randomizeBytes(randomValue.getBytes(),0,randomValue.getLength());
writer.append(randomKey,randomValue);
writer.close();

当fileSizeInMB> 700MB时,出现如下错误:

  java.lang.NegativeArraySizeException 
at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
at org.apache .hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
...

我看到这个错误正在讨论,但没有看到任何解决方案。注意int(2 ^ 32)可以大到2GB,它不应该在700MB失败。



如果您有其他选择来创建如此大的SequenceFile,请指教。我尝试了其他方法,如IOutils.read从inputstream转换为byte [],我得到了堆大小或OOME。

解决方案

只需使用ArrayPrimitiveWritable。



通过在 BytesWritable here:

  public void setSize(int size){
if( sizeCapacity(size * 3/2);
}
this.size = size;
}



<700> Mb * 3> 2Gb = int溢出!



因此,您无法反序列化(但可以编写并序列化)超过700 MB的BytesWritable。


I have tried different ways to create a large Hadoop SequenceFile with simply one short(<100bytes) key but one large (>1GB) value (BytesWriteable).

The following sample works for out-of-box:

https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/BigMapOutput.java

which writes multiple random-length key and value with total size >3GB.

However, it is not what I am trying to do. So I modified it using hadoop 2.2.0 API to something like:

      Path file = new Path("/input");
      SequenceFile.Writer writer = SequenceFile.createWriter(conf,
      SequenceFile.Writer.file(file),
      SequenceFile.Writer.compression(CompressionType.NONE),
      SequenceFile.Writer.keyClass(BytesWritable.class),
      SequenceFile.Writer.valueClass(BytesWritable.class));
      int numBytesToWrite = fileSizeInMB * 1024 * 1024;
      BytesWritable randomKey = new BytesWritable();
      BytesWritable randomValue = new BytesWritable();
      randomKey.setSize(1);
      randomValue.setSize(numBytesToWrite);
      randomizeBytes(randomValue.getBytes(), 0, randomValue.getLength());
      writer.append(randomKey, randomValue);
      writer.close();

When fileSizeInMB>700MB, I am getting errors like:

java.lang.NegativeArraySizeException
        at  org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:144)
        at  org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:123)
        ...

I see this error being discussed, but not see any resolution. Note that int(2^32) can be as large as 2GB, it should not fail at 700MB.

If you have other alternative to create such large-value SequenceFile, please advise. I tried other approaches like IOutils.read from inputstream into a byte [], I got heap size or OOME.

解决方案

just use ArrayPrimitiveWritable instead.

There is an int overflow by setting new capacity in BytesWritable here:

public void setSize(int size) {
    if (size > getCapacity()) {
       setCapacity(size * 3 / 2);
    }
    this.size = size;
}

700 Mb * 3 > 2Gb = int overflow!

As result you cannot deserialize (but can write and serialize) more than 700 Mb into BytesWritable.

这篇关于创建具有大(> 1GB)BytesWritable值大小的SequenceFile时的NegativeArraySizeException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆