sbt组件着色以创建可以运行火花的胖子 [英] sbt assembly shading to create fat jar to run on spark
问题描述
我正在使用sbt程序集创建一个可以在spark上运行的胖罐.依赖于grpc-netty
. spark上的Guava版本比grpc-netty
所要求的版本旧,并且我遇到此错误:
I'm using sbt assembly to create a fat jar which can run on spark. Have dependencies on grpc-netty
. Guava version on spark is older than the one required by grpc-netty
and I run into this error: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument. I was able to resolve this by setting userClassPathFirst to true on spark, but leads to other errors.
如果我错了,请更正我,但是据我了解,如果正确着色,我不必将userClassPathFirst设置为true.这是我现在做底纹的方法:
Correct me if I am wrong, but from what I understand, I shouldn't have to set userClassPathFirst to true if I do shading correctly. Here's how I do shading now:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.guava.**" -> "my_conf.@1")
.inLibrary("com.google.guava" % "guava" % "20.0")
.inLibrary("io.grpc" % "grpc-netty" % "1.1.2")
)
libraryDependencies ++= Seq(
"org.scalaj" %% "scalaj-http" % "2.3.0",
"org.json4s" %% "json4s-native" % "3.2.11",
"org.json4s" %% "json4s-jackson" % "3.2.11",
"org.apache.spark" %% "spark-core" % "2.2.0" % "provided",
"org.apache.spark" % "spark-sql_2.11" % "2.2.0" % "provided",
"org.clapper" %% "argot" % "1.0.3",
"com.typesafe" % "config" % "1.3.1",
"com.databricks" %% "spark-csv" % "1.5.0",
"org.apache.spark" % "spark-mllib_2.11" % "2.2.0" % "provided",
"io.grpc" % "grpc-netty" % "1.1.2",
"com.google.guava" % "guava" % "20.0"
)
我在这里做错什么以及如何解决?
What am I doing wrong here and how do I fix it?
推荐答案
您快到了. shadeRule
的作用是重命名类名称,不是库名:
You are almost there. What shadeRule
does is it renames class names, not library names:
主要的ShadeRule.rename规则用于重命名类.所有对重命名类的引用也会被更新.
The main ShadeRule.rename rule is used to rename classes. All references to the renamed classes will also be updated.
实际上,在 com.google.guava:guava
中没有任何类带有包装com.google.guava
:
$ jar tf ~/Downloads/guava-20.0.jar | sed -e 's:/[^/]*$::' | sort | uniq
META-INF
META-INF/maven
META-INF/maven/com.google.guava
META-INF/maven/com.google.guava/guava
com
com/google
com/google/common
com/google/common/annotations
com/google/common/base
com/google/common/base/internal
com/google/common/cache
com/google/common/collect
com/google/common/escape
com/google/common/eventbus
com/google/common/graph
com/google/common/hash
com/google/common/html
com/google/common/io
com/google/common/math
com/google/common/net
com/google/common/primitives
com/google/common/reflect
com/google/common/util
com/google/common/util/concurrent
com/google/common/xml
com/google/thirdparty
com/google/thirdparty/publicsuffix
将着色规则更改为此即可:
It should be enough to change your shading rule to this:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.common.**" -> "my_conf.@1")
.inLibrary("com.google.guava" % "guava" % "20.0")
.inLibrary("io.grpc" % "grpc-netty" % "1.1.2")
)
因此您无需更改userClassPathFirst
.
此外,您可以像这样简化着色规则:
Moreover, you can simplify your shading rule like this:
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.common.**" -> "my_conf.@1").inAll
)
由于org.apache.spark
依赖项是provided
,因此它们将不会包含在jar中,也不会被着色(因此spark将使用其在群集中拥有的自己无阴影的番石榴版本).
Since org.apache.spark
dependencies are provided
, they will not be included in your jar and will not be shaded (hence spark will use its own unshaded version of guava that it has on the cluster).
这篇关于sbt组件着色以创建可以运行火花的胖子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!