为什么我的build.sbt中的番石榴没有正确着色? [英] Why isn't guava being shaded properly in my build.sbt?

查看:91
本文介绍了为什么我的build.sbt中的番石榴没有正确着色?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tl; dr:此处是包含该问题的存储库。 / h1>


Cassandra和HDFS都在内部使用番石榴,但是由于各种原因,它们都不遮蔽依赖关系。由于番石榴的版本不兼容二进制文件,因此我在运行时发现 NoSuchMethodError s。


我尝试着色番石榴自己在我的 build.sbt 中:

  val HadoopVersion = 2.6.0- cdh5.11.0; 

// ...

val hadoopHdfs = org.apache.hadoop % hadoop-hdfs %HadoopVersion
val hadoopCommon = org.apache.hadoop; % hadoop-common %HadoopVersion
val hadoopHdfsTest = org.apache.hadoop; % hadoop-hdfs %HadoopVersion%测试;分类器测试
val hadoopCommonTest = org.apache.hadoop; % hadoop-common %HadoopVersion%测试;分类器测试
val hadoopMiniDFSCluster = org.apache.hadoop; % hadoop-minicluster %HadoopVersion%测试

// ...

assemblyShadeRules在程序集中:= Seq(
ShadeRule.rename(" com.google.common。** "->" shade.com.google.common。@ 1")。inLibrary(hadoopHdfs).inProject中,
ShadeRule.rename(" com.google.common。**"-> " shade.com.google.common。@ 1)。inLibrary(hadoopCommon).inProject,
ShadeRule.rename(" com.google.common。**"->" shade.com .google.common。@ 1)。inLibrary(hadoopHdfsTest).inProject,
ShadeRule.rename(" com.google.common。**"->" shade.com.google.common。 @ 1)。inLibrary(hadoopCommonTest).inProject,
ShadeRule.rename( com.google.common。**-> shade.com.google.common。@ 1)。 inLibrary(hadoopMiniDFSCluster)。inProject


assemblyJarName in assembly:= s $ {name.value}-$ {version.value} .jar

assemblyMergeStrategy in assembly:= {
case PathList( META-INF, MANIFEST.MF)=> MergeStrategy.discard
case _ =>合并策略实例。


特定的例外是

  [info] HdfsEntitySpec ***终止*** 
[info] java.lang.NoSuchMethodError:com.google.common.base.Objects.toStringHelper(Ljava / lang / Object;)Lcom / google / common / base / Objects $ ToStringHelper;
[信息],位于org.apache.hadoop.metrics2.lib.MetricsRegistry.toString(MetricsRegistry.java:406)
[信息],位于java.lang.String.valueOf(String.java:2994)
[信息]在java.lang.StringBuilder.append(StringBuilder.java:131)
[信息]在org.apache.hadoop.ipc.metrics.RetryCacheMetrics。< init>(RetryCacheMetrics.java :46)org.apache.hadoop.ipc.metrics.RetryCacheMetrics.create(RetryCacheMetrics.java:53)中的
[info]。org.apache.hadoop.ipc.RetryCache中的
[info]。 ; init>(RetryCache.java:202)
[info]在org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initRetryCache(FSNamesystem.java:1038)
[info]在org。 apache.hadoop.hdfs.server.namenode.FSNamesystem。< init>(FSNamesystem.java:949)
[info]位于org.apache.hadoop.hdfs.server.namenode.FSNamesystem。< init>( FSNamesystem.java:796)
[info]在org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1040)
[info] ...

如何正确着色番石榴停止运行时错误?

解决方案

着色规则仅在构建胖罐子时适用。它不会在其他sbt任务中应用。



如果要在hadoop依赖项中隐藏某些库,则可以创建一个仅包含hadoop依赖项的新项目,对库进行着色,并使用以下内容发布胖罐



这不是一个完美的解决方案,因为新的hadoop jar中的所有依赖关系对于使用它们的用户都是未知的,并且您将需要手动处理冲突。



以下是您在 build.sbt 中需要的代码发布一个胖的hadoop jar
(使用您的代码和sbt程序集 docs ):

  val HadoopVersion = 2.6.0-cdh5.11.0 

val hadoopHdfs = org.apache.hadoop% hadoop-hdfs%HadoopVersion
val hadoopCommon = org.apache.hadoop% hadoop-common%HadoopVersion
val hadoopHdfsTest = org.apache .hadoop% hadoop-hdfs%HadoopVersion classi fier测试
val hadoopCommonTest = org.apache.hadoop% hadoop-common%HadoopVersion%分类器 tests
val hadoopMiniDFSCluster = org.apache.hadoop% hadoop-minicluster %HadoopVersion

lazy val fatJar = project
.enablePlugins(AssemblyPlugin)
.settings(
libraryDependencies ++ = Seq(
hadoopHdfs,
hadoopCommon,
hadoopHdfsTest,
hadoopCommonTest,
hadoopMiniDFSCluster
),
assemblyShadeRules in assembly:= Seq(
ShadeRule.rename( com.google .common。**-> shade。@ 0)。inAll
),
assemblyMergeStrategy in assembly:= {
case PathList( META-INF, MANIFEST.MF)=> MergeStrategy.discard
case _ => MergeStrategy.first
},
工件在(编译,汇编)中:= {
val art =((编译,汇编)中的工件).value
art.withClassifier(Some ( assembly))
},
addArtifact((编译,组装),汇编中的工件),
crossPaths:= false,//不要将Scala版本附加到生成的工件
autoScalaLibrary:= false,//禁止将与Scala相关的库包含到依赖中
在发布中跳过:= true


lazy val shaded_hadoop = project
.settings(
name:= shaded-hadoop,
packageBin in编译:=((fatJar,Compile中的汇编))。value

我还没有测试过,但这就是要点。






我想指出另一个我注意到的问题,因为您要对某些文件应用不同的策略,因此合并策略可能会给您带来麻烦。请参见默认策略此处

我建议使用类似这样的方法来保留所有未进行重复数据删除的原始策略

  assemblyMergeStrategy in assembly:= {
entry:String => {
val策略=(程序集中的assemblyMergeStrategy).value(entry)
if(strategy == MergeStrategy.deduplicate)MergeStrategy.first
else strategy
}
}


tl;dr: Here's a repo containing the problem.


Cassandra and HDFS both use guava internally, but neither of them shades the dependency for various reasons. Because the versions of guava aren't binary compatible, I'm finding NoSuchMethodErrors at runtime.

I've tried to shade guava myself in my build.sbt:

val HadoopVersion =  "2.6.0-cdh5.11.0"

// ...

val hadoopHdfs = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % HadoopVersion
val hadoopHdfsTest = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion % "test" classifier "tests"
val hadoopCommonTest = "org.apache.hadoop" % "hadoop-common" % HadoopVersion % "test" classifier "tests"
val hadoopMiniDFSCluster = "org.apache.hadoop" % "hadoop-minicluster" % HadoopVersion % Test

// ...

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopHdfs).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopCommon).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopHdfsTest).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopCommonTest).inProject,
  ShadeRule.rename("com.google.common.**" -> "shade.com.google.common.@1").inLibrary(hadoopMiniDFSCluster).inProject
)

assemblyJarName in assembly := s"${name.value}-${version.value}.jar"

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
  case _ => MergeStrategy.first
}

but the runtime exception persists (ha -- it's a cassandra joke, people).

The specific exception is

[info] HdfsEntitySpec *** ABORTED ***
[info]   java.lang.NoSuchMethodError: com.google.common.base.Objects.toStringHelper(Ljava/lang/Object;)Lcom/google/common/base/Objects$ToStringHelper;
[info]   at org.apache.hadoop.metrics2.lib.MetricsRegistry.toString(MetricsRegistry.java:406)
[info]   at java.lang.String.valueOf(String.java:2994)
[info]   at java.lang.StringBuilder.append(StringBuilder.java:131)
[info]   at org.apache.hadoop.ipc.metrics.RetryCacheMetrics.<init>(RetryCacheMetrics.java:46)
[info]   at org.apache.hadoop.ipc.metrics.RetryCacheMetrics.create(RetryCacheMetrics.java:53)
[info]   at org.apache.hadoop.ipc.RetryCache.<init>(RetryCache.java:202)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initRetryCache(FSNamesystem.java:1038)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:949)
[info]   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:796)
[info]   at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1040)
[info]   ...

How can I properly shade guava to stop the runtime errors?

解决方案

The shading rules will only apply when you are building a fat jar. It won't be applied during other sbt tasks.

If you want to shade some library inside of your hadoop dependencies, you can create a new project with only the hadoop dependencies, shade the libraries, and publish a fat jar with the all the shaded hadoop dependencies.

This is not a perfect solution, because all of the dependencies in the new hadoop jar will be "unknown" to whom uses them, and you will need to handle conflicts manually.

Here is the code that you will need in your build.sbt to publish a fat hadoop jar (using your code and sbt assembly docs):

val HadoopVersion =  "2.6.0-cdh5.11.0"

val hadoopHdfs = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion
val hadoopCommon = "org.apache.hadoop" % "hadoop-common" % HadoopVersion
val hadoopHdfsTest = "org.apache.hadoop" % "hadoop-hdfs" % HadoopVersion classifier "tests"
val hadoopCommonTest = "org.apache.hadoop" % "hadoop-common" % HadoopVersion %  classifier "tests"
val hadoopMiniDFSCluster = "org.apache.hadoop" % "hadoop-minicluster" % HadoopVersion 

lazy val fatJar = project
  .enablePlugins(AssemblyPlugin)
  .settings(
    libraryDependencies ++= Seq(
        hadoopHdfs,
        hadoopCommon,
        hadoopHdfsTest,
        hadoopCommonTest,
        hadoopMiniDFSCluster
    ),
      assemblyShadeRules in assembly := Seq(
      ShadeRule.rename("com.google.common.**" -> "shade.@0").inAll
    ),
    assemblyMergeStrategy in assembly := {
      case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
      case _ => MergeStrategy.first
    },
    artifact in (Compile, assembly) := {
      val art = (artifact in (Compile, assembly)).value
      art.withClassifier(Some("assembly"))
    },
    addArtifact(artifact in (Compile, assembly), assembly),
    crossPaths := false, // Do not append Scala versions to the generated artifacts
    autoScalaLibrary := false, // This forbids including Scala related libraries into the dependency
    skip in publish := true
  )

lazy val shaded_hadoop = project
  .settings(
    name := "shaded-hadoop",
    packageBin in Compile := (assembly in (fatJar, Compile)).value
  )

I haven't tests it, but that is the gist of it.


I'd like to point out out another issue that I noticed, your merge strategy might cause you problems, since you want to apply different strategies on some of the files. see the default strategy here.
I would recommend using something like this to preserve the original strategy for everything that is not deduplicate

assemblyMergeStrategy in assembly := {
          entry: String => {
            val strategy = (assemblyMergeStrategy in assembly).value(entry)
            if (strategy == MergeStrategy.deduplicate) MergeStrategy.first
            else strategy
          }
      }

这篇关于为什么我的build.sbt中的番石榴没有正确着色?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆