将Spark的Kryo序列化程序与具有字符串数组的Java协议缓冲区一起使用时出错 [英] Error using Spark's Kryo serializer with java protocol buffers that have arrays of strings

查看:213
本文介绍了将Spark的Kryo序列化程序与具有字符串数组的Java协议缓冲区一起使用时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在将Java协议缓冲区类用作Spark作业中的RDD的对象模型时,我遇到了一个错误,

I am hitting a bug when using java protocol buffer classes as the object model for RDDs in Spark jobs,

对于我的应用程序,我的proto文件具有重复字符串的属性.例如

For my application, my ,proto file has properties that are repeated string. For example

message OntologyHumanName 
{ 
repeated string family = 1;
}

由此,2.5.0协议编译器生成如下Java代码

From this, the 2.5.0 protoc compiler generates Java code like

private com.google.protobuf.LazyStringList family_ = com.google.protobuf.LazyStringArrayList.EMPTY;

如果我运行使用Kryo序列化程序的Scala Spark作业,则会出现以下错误

If I run a Scala Spark job that uses the Kryo serializer I get the following error

Caused by: java.lang.NullPointerException
at com.google.protobuf.UnmodifiableLazyStringList.size(UnmodifiableLazyStringList.java:61)
at java.util.AbstractList.add(AbstractList.java:108)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
... 40 more

相同的代码可以在spark.serializer = org.apache.spark.serializer.JavaSerializer中正常工作.

The same code works fine with spark.serializer=org.apache.spark.serializer.JavaSerializer.

我的环境是带有JDK 1.8.0_60的CDH QuickStart 5.5

My environment is CDH QuickStart 5.5 with JDK 1.8.0_60

推荐答案

尝试向 Lazy 类注册:

Kryo kryo = new Kryo()

kryo.register(com.google.protobuf.LazyStringArrayList.class)

对于自定义Protobuf消息,也请查看此 answer 中用于注册由<代码>协议.

Also for custom Protobuf messages take a look at the solution in this answer for registering custom/nestes classes generated by protoc.

这篇关于将Spark的Kryo序列化程序与具有字符串数组的Java协议缓冲区一起使用时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆