在google dataproc集群实例中的spark-submit上运行app jar文件 [英] Running app jar file on spark-submit in a google dataproc cluster instance

查看:222
本文介绍了在google dataproc集群实例中的spark-submit上运行app jar文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个.jar文件,其中包含我需要打包的所有依赖项。其中一个依赖项是 com.google.common.util.concurrent.RateLimiter ,并且已经检查过它的类文件位于此.jar文件中。

I'm running a .jar file that contains all dependencies that I need packaged in it. One of this dependencies is com.google.common.util.concurrent.RateLimiter and already checked it's class file is in this .jar file.

不幸的是,当我在google的dataproc-cluster实例的主节点上点击命令spark-submit时,我收到此错误:

Unfortunately when I hit the command spark-submit on the master node of my google's dataproc-cluster instance I'm getting this error:

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
at com.google.common.util.concurrent.RateLimiter$SleepingStopwatch$1.<init>(RateLimiter.java:417)
at com.google.common.util.concurrent.RateLimiter$SleepingStopwatch.createFromSystemTimer(RateLimiter.java:416)
at com.google.common.util.concurrent.RateLimiter.create(RateLimiter.java:130)
at LabeledAddressDatasetBuilder.publishLabeledAddressesFromBlockstem(LabeledAddressDatasetBuilder.java:60)
at LabeledAddressDatasetBuilder.main(LabeledAddressDatasetBuilder.java:144)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

似乎是某事发生在覆盖我的依赖关系的意义上。已经从这个.jar反编译 Stopwatch.class 文件并检查该方法是否存在。这恰好发生在我运行google dataproc实例时。
我在执行 spark-submit 的过程中执行了 grep ,我得到了标志 -cp 像这样:

It seems something happened in the sense of overwriting my dependencies. Already decompiled the Stopwatch.class file from this .jar and checked that method is there. That just happened when I ran on that google dataproc instance. I did grep on the process executing the spark-submit and I got the flag -cp like this:

/usr/lib/jvm/java-8-openjdk-amd64/bin/java -cp /usr/lib/spark/conf/:/usr/lib/spark/lib/spark-assembly-1.5.0-hadoop2.7.1.jar:/usr/lib/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/lib/spark/lib/datanucleus-core-3.2.10.jar:/etc/hadoop/conf/:/etc/hadoop/conf/:/usr/lib/hadoop/lib/native/:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*

有吗我能做些什么来解决这个问题?

Is there anything I can do to solve this problem?

谢谢。

推荐答案

正如您所发现的,Dataproc在调用Spark时会在类路径中包含Hadoop依赖项。这主要是为了使用Hadoop输入格式,文件系统等非常简单。缺点是你最终会得到Hadoop的番石榴版本11.02(见 HADOOP-10101 )。

As you've found, Dataproc includes Hadoop dependencies on the classpath when invoking Spark. This is done primarily so that using Hadoop input formats, file systems, etc is fairly straight-forward. The downside is that you will end up with Hadoop's guava version which is 11.02 (See HADOOP-10101).

如何解决这个问题取决于您的构建系统。如果使用Maven,maven-shade插件可用于在新的包名下重新定位您的番石榴版本。可以在 GCS Hadoop Connector的包装中看到此示例,但它的关键是你的pom.xml构建部分中的以下插件声明:

How to work around this depends on your build system. If using Maven, the maven-shade plugin can be used to relocate your version of guava under a new package name. An example of this can be seen in the GCS Hadoop Connector's packaging, but the crux of it is the following plugin declaration in your pom.xml build section:

  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>2.3</version>
    <executions>
      <execution>
        <phase>package</phase>
        <goals>
          <goal>shade</goal>
        </goals>
        <configuration>
          <relocations>
            <relocation>
              <pattern>com.google.common</pattern>
              <shadedPattern>your.repackaged.deps.com.google.common</shadedPattern>
            </relocation>
          </relocations>
        </execution>
      </execution>
    </plugin>

使用sbt的sbt-assembly插件,ant的jarjar和jarjar可以实现类似的重定位或影子为gradle。

Similar relocations can be accomplished with the sbt-assembly plugin for sbt, jarjar for ant, and either jarjar or shadow for gradle.

这篇关于在google dataproc集群实例中的spark-submit上运行app jar文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆