我如何在Hadoop / Spark中使用proto3? [英] How can I use proto3 with Hadoop/Spark?

查看:193
本文介绍了我如何在Hadoop / Spark中使用proto3?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个依赖于语法=proto3; 的.proto文件。我还有一个用于构建Hadoop / Spark作业(Hadoop 2.7.1和Spark 1.5.2)的Maven项目。我想在Hadoop / Spark中生成数据,然后根据我的proto3文件对其进行序列化。

I've got several .proto files which rely on syntax = "proto3";. I also have a Maven project that is used to build Hadoop/Spark jobs (Hadoop 2.7.1 and Spark 1.5.2). I'd like to generate data in Hadoop/Spark and then serialize it according to my proto3 files.

使用libprotoc 3.0.0,我生成了可以在我的Maven项目中正常工作的Java源代码,只要我在我的pom.xml中有以下内容:

Using libprotoc 3.0.0, I generate Java sources which work fine within my Maven project as long as I have the following in my pom.xml:

<dependency>
  <groupId>com.google.protobuf</groupId>
  <artifactId>protobuf-java</artifactId>
  <version>3.0.0-beta-1</version>
</dependency>  

现在,当我使用libprotoc生成的类在部署到集群的作业中时,命中:

Now, when I use my libprotoc-generated classes in a job that gets deployed to a cluster I get hit with:

java.lang.VerifyError : class blah overrides final method mergeUnknownFields.(Lcom/google/protobuf/UnknownFieldSet;)Lcom/google/protobuf/GeneratedMessage$Builder;
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:760)

鉴于Hadoop / Spark依赖于与我的3.0.0-beta-1不兼容的protobuf-java 2.5.0,ClassLoader失败似乎是合理的。我还注意到,protobufs(大概版本< 3)已经在其他几个地方找到了我的jar包:

ClassLoader failing seems reasonable given that Hadoop/Spark have a dependency on protobuf-java 2.5.0 which is incompatible with my 3.0.0-beta-1. I also noticed that protobufs (presumably versions < 3) have found their way into my jar in a few other places:

$ jar tf target/myjar-0.1-SNAPSHOT.jar | grep protobuf | grep '/$'
org/apache/hadoop/ipc/protobuf/
org/jboss/netty/handler/codec/protobuf/
META-INF/maven/com.google.protobuf/
META-INF/maven/com.google.protobuf/protobuf-java/
org/apache/mesos/protobuf/
io/netty/handler/codec/protobuf/
com/google/protobuf/
google/protobuf/

有什么我可以做的(Maven Shade?)对此进行排序?

Is there something I can do (Maven Shade?) to sort this out?

类似的问题在这里: Spark java.lang.VerifyError

推荐答案

原来这里有一些记录: a href =https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html =noreferrer> https://maven.apache.org/plugins/maven- shade-plugin / examples / class-relocation.html

Turns out this kinda thing is documented here: https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html

只需要重新定位protobuffers,然后VerifyError消失:

Just need to relocate the protobuffers and the VerifyError goes away:

          <relocations>
            <relocation>
              <pattern>com.google.protobuf</pattern>
              <shadedPattern>shaded.com.google.protobuf</shadedPattern>
            </relocation>
          </relocations>

这篇关于我如何在Hadoop / Spark中使用proto3?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆