Kryo序列化器如何在Spark中分配缓冲区 [英] How Kryo serializer allocates buffer in Spark

查看:448
本文介绍了Kryo序列化器如何在Spark中分配缓冲区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请帮助了解Kryo串行器如何为其缓冲区分配内存.

Please help to understand how Kryo serializer allocates memory for its buffer.

我的Spark应用程序尝试从工作人员向驱动程序收集大约122Mb的数据时,在收集步骤上失败.

My Spark app fails on a collect step when it tries to collect about 122Mb of data to a driver from workers.

com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 57197
    at com.esotericsoftware.kryo.io.Output.require(Output.java:138)
    at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:220)
    at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:206)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:29)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:18)
    at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:312)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
    at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
    at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

在我将驱动程序内存增加到3Gb,将执行程序内存增加到4Gb,并且增加了kryoserializer的缓冲区大小(我正在使用Spark 1.3)之后,会显示此异常

This exception is shown after I've increased the driver memory to 3Gb and executor memory to 4Gb and increased buffer size for kryoserializer (I'm using Spark 1.3)

conf.set('spark.kryoserializer.buffer.mb', '256')
conf.set('spark.kryoserializer.buffer.max', '512')

我认为我已经设置了足够大的缓冲区,但是我的spark应用程序一直崩溃. 如何检查执行程序上使用Kryo缓冲区的对象? 有办法清理吗?

I think I've set buffer to be big enough, but my spark app keeps crashing. How can I check what objects are using Kryo buffer on a executor? Is there way to clean it up?

推荐答案

就我而言,问题是使用了错误的最大缓冲区大小的属性名.

In my case, the problem was using the wrong property name for the max buffer size.

直到Spark版本1.3 属性名称为spark.kryoserializer.buffer.max.mb-它的末尾有".mb".但是我使用了 Spark 1.4文档-spark.kryoserializer.buffer.max.

Up to Spark version 1.3 the property name is spark.kryoserializer.buffer.max.mb - it has ".mb" in the end. But I used property name from Spark 1.4 docs - spark.kryoserializer.buffer.max .

结果是spark应用程序使用默认值-64mb.而且这还不足以处理我正在处理的数据量.

As a result spark app was using the default value - 64mb. And it was not enough for the amount of data I was processing.

将属性名称固定为spark.kryoserializer.buffer.max.mb后,我的应用程序运行正常.

After I fixed the property name to spark.kryoserializer.buffer.max.mb my app worked fine.

这篇关于Kryo序列化器如何在Spark中分配缓冲区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆