sbt组件着色以创建可以运行火花的胖子 [英] sbt assembly shading to create fat jar to run on spark

查看:97
本文介绍了sbt组件着色以创建可以运行火花的胖子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用sbt程序集创建一个可以在spark上运行的胖罐.依赖于grpc-netty. spark上的Guava版本比grpc-netty所要求的版本旧,并且我遇到此错误:

I'm using sbt assembly to create a fat jar which can run on spark. Have dependencies on grpc-netty. Guava version on spark is older than the one required by grpc-netty and I run into this error: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument. I was able to resolve this by setting userClassPathFirst to true on spark, but leads to other errors.

如果我错了,请更正我,但是据我了解,如果正确着色,我不必将userClassPathFirst设置为true.这是我现在做底纹的方法:

Correct me if I am wrong, but from what I understand, I shouldn't have to set userClassPathFirst to true if I do shading correctly. Here's how I do shading now:

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.guava.**" -> "my_conf.@1")
    .inLibrary("com.google.guava" % "guava" % "20.0")
    .inLibrary("io.grpc" % "grpc-netty" % "1.1.2")
)

libraryDependencies ++= Seq(
  "org.scalaj" %% "scalaj-http" % "2.3.0",
  "org.json4s" %% "json4s-native" % "3.2.11",
  "org.json4s" %% "json4s-jackson" % "3.2.11",
  "org.apache.spark" %% "spark-core" % "2.2.0" % "provided",
  "org.apache.spark" % "spark-sql_2.11" % "2.2.0" % "provided",
  "org.clapper" %% "argot" % "1.0.3",
  "com.typesafe" % "config" % "1.3.1",
  "com.databricks" %% "spark-csv" % "1.5.0",
  "org.apache.spark" % "spark-mllib_2.11" % "2.2.0" % "provided",
  "io.grpc" % "grpc-netty" % "1.1.2",
  "com.google.guava" % "guava" % "20.0"
)

我在这里做错什么以及如何解决?

What am I doing wrong here and how do I fix it?

推荐答案

您快到了. shadeRule的作用是重命名名称,不是库名:

You are almost there. What shadeRule does is it renames class names, not library names:

主要的ShadeRule.rename规则用于重命名类.所有对重命名类的引用也会被更新.

The main ShadeRule.rename rule is used to rename classes. All references to the renamed classes will also be updated.

实际上,在 com.google.guava:guava 中没有任何类带有包装com.google.guava:

$ jar tf ~/Downloads/guava-20.0.jar  | sed -e 's:/[^/]*$::' | sort | uniq
META-INF
META-INF/maven
META-INF/maven/com.google.guava
META-INF/maven/com.google.guava/guava
com
com/google
com/google/common
com/google/common/annotations
com/google/common/base
com/google/common/base/internal
com/google/common/cache
com/google/common/collect
com/google/common/escape
com/google/common/eventbus
com/google/common/graph
com/google/common/hash
com/google/common/html
com/google/common/io
com/google/common/math
com/google/common/net
com/google/common/primitives
com/google/common/reflect
com/google/common/util
com/google/common/util/concurrent
com/google/common/xml
com/google/thirdparty
com/google/thirdparty/publicsuffix

将着色规则更改为此即可:

It should be enough to change your shading rule to this:

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.common.**" -> "my_conf.@1")
    .inLibrary("com.google.guava" % "guava" % "20.0")
    .inLibrary("io.grpc" % "grpc-netty" % "1.1.2")
)

因此您无需更改userClassPathFirst.

此外,您可以像这样简化着色规则:

Moreover, you can simplify your shading rule like this:

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.common.**" -> "my_conf.@1").inAll
)

由于org.apache.spark依赖项是provided,因此它们将不会包含在jar中,也不会被着色(因此spark将使用其在群集中拥有的自己无阴影的番石榴版本).

Since org.apache.spark dependencies are provided, they will not be included in your jar and will not be shaded (hence spark will use its own unshaded version of guava that it has on the cluster).

这篇关于sbt组件着色以创建可以运行火花的胖子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆