由于UTFDataFormatException引起的任务无法在Spark中序列化:编码字符串太长 [英] Task not serializable in Spark caused by UTFDataFormatException: encoded string too long

查看:237
本文介绍了由于UTFDataFormatException引起的任务无法在Spark中序列化:编码字符串太长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在纱线上运行我的Spark应用程序时遇到一些问题.我有非常广泛的集成测试,正在运行,没有任何问题,但是当我在YARN上运行该应用程序时,它将引发以下错误:

I have some problems running my Spark Application on Yarn. I have very extensive Integration Tests that are running without any problems but when I run the application on YARN it will throw the following error:

17/01/06 11:22:23 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Task not serializable
org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2067)
    at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:324)
    at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:323)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    at org.apache.spark.rdd.RDD.map(RDD.scala:323)
    at org.apache.spark.sql.DataFrame.map(DataFrame.scala:1410)
    at com.orgx.yy.dd.check.DQCheck$class.runDQCheck(DQCheck.scala:24)
    at com.orgx.yy.dd.check.DQBatchCheck.runDQCheck(DQBatchCheck.scala:13)
    at com.orgx.yy.dd.check.DQBatchCheck.doCheck(DQBatchCheck.scala:23)
    at com.orgx.yy.dd.DQChecker$.main(DQChecker.scala:60)
    at com.orgx.yy.dd.DQChecker.main(DQChecker.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
Caused by: java.io.UTFDataFormatException: encoded string too long: 72887 bytes
    at java.io.DataOutputStream.writeUTF(DataOutputStream.java:364)
    at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
    at com.typesafe.config.impl.SerializedConfigValue.writeValueData(SerializedConfigValue.java:295)
    at com.typesafe.config.impl.SerializedConfigValue.writeValue(SerializedConfigValue.java:369)
    at com.typesafe.config.impl.SerializedConfigValue.writeValueData(SerializedConfigValue.java:309)
    at com.typesafe.config.impl.SerializedConfigValue.writeValue(SerializedConfigValue.java:369)
    at com.typesafe.config.impl.SerializedConfigValue.writeValueData(SerializedConfigValue.java:309)
    at com.typesafe.config.impl.SerializedConfigValue.writeValue(SerializedConfigValue.java:369)
    at com.typesafe.config.impl.SerializedConfigValue.writeValueData(SerializedConfigValue.java:309)
    at com.typesafe.config.impl.SerializedConfigValue.writeValue(SerializedConfigValue.java:369)
    at com.typesafe.config.impl.SerializedConfigValue.writeExternal(SerializedConfigValue.java:435)
    at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
    at scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
    ... 20 more
17/01/06 11:22:24 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.SparkException: Task not serializable)

罪魁祸首似乎是java.io.UTFDataFormatException:编码字符串太长:at 72887字节.有人知道为什么会这样吗?

The culprit seems to be java.io.UTFDataFormatException: encoded string too long: 72887 bytes at. Anyone have any idea why that is happening?

推荐答案

我设法解决了此问题.问题是我将Typesafe配置引入了无法序列化的函数所使用的类之一.通过添加配置,这增加了总内存占用并超过了64KB的限制.

I managed to resolve this issue. The problem was that I introduced Typesafe config to one of the classes that were being used by the function that failed to be serialized. By adding the config this increased the total memory footprint and exceeded the 64KB limit.

当我从类中删除配置对象时,它又能正常工作.

When I removed the config object from the class it was working fine again.

这篇关于由于UTFDataFormatException引起的任务无法在Spark中序列化:编码字符串太长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆