火花与Elasticsearch TransportClient YARN confilict因为番石榴库不同版本 [英] Spark on YARN confilict with Elasticsearch TransportClient because of Guava library different versions
问题描述
我想运行一个谷歌的云虚拟机集群和地图操作,我需要做弹性的搜索查询内部火花的工作。我的问题是,Spark和弹性搜索对番石榴库冲突,因为Spark是利用番石榴14和ES番石榴18
I want to run a Spark-job on a Google Cloud VM cluster and inside a map operation I need to make a query on elastic search. My problem is that Spark and Elastic Search have a conflict on the Guava library, as Spark is using Guava 14 and ES Guava 18.
我的问题是这样的方法调用 com.google.common.util.concurrent.MoreExecutors.directExecutor()
,它存在于番石榴18,但不是在番石榴14
My problem is this method call
com.google.common.util.concurrent.MoreExecutors.directExecutor()
, which exists in Guava 18, but not in Guava 14.
在更详细的我试图做的工作就是像下面这样。
In more detail the job I am trying to do is something like the following.
input.map(record=>{
val client=openConnection()
val newdata=client.query(record.someInfo)
new record(newdata)
})
该方法的openConnection
如下所示
public static TransportClient openConnection(String ipAddress, int ipPort) throws UnknownHostException {
Settings settings = Settings.settingsBuilder().put("cluster.name", "elasticsearch").build();
TransportClient client = TransportClient.builder().settings(settings).build().
addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(ipAddress), ipPort));
return client;
}
我曾尝试使用着色强制ES由SBT文件中添加着色规则,使用番石榴18如下:
I have tried to use shading to force ES to use Guava 18 by adding a shading rule in the sbt file as follows:
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-graphx_2.10" % "1.6.0" % "provided" ,
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.0" % "provided" ,
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.6.0" % "
libraryDependencies += "org.elasticsearch" % "elasticsearch" % "2.2.0",
assemblyShadeRules in assembly :=Seq(
ShadeRule.rename("com.google.common.*" -> "googlecommona.@1").
inLibrary("org.elasticsearch" % "elasticsearch" % "2.2.0"))
的问题然而,似乎仍然存在。
有没有办法解决这个confict的方法吗?
The problem however seems to remain. Is there a way to resolve this confict?
推荐答案
底纹那人回答:我加了 build.sbt
文件以下规则
Shading was the answer: I added the following rule in the build.sbt
file.
下面的解决方案,适用于使用该ElasticSearch火花集群在YARN TransportClient
类。
The solution below, works for a SPARK-cluster over YARN that uses the ElasticSearch TransportClient
class.
assemblyShadeRules in assembly :=Seq(
ShadeRule.rename("com.google.**" -> "googlecommona.@1").inAll
)
我附上整个SBT文件的完整性:
I attach the whole sbt file for completeness:
import sbt.ExclusionRule
import sbt.Keys._
lazy val root = (project in file(".")).
settings(
name := "scala_code",
version := "1.0",
scalaVersion := "2.10.6",
conflictManager := ConflictManager.latestRevision,
test in assembly := {},
assemblyMergeStrategy in assembly := {
case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
case _ => MergeStrategy.first
},
parallelExecution in test := false,
libraryDependencies += "com.fasterxml.jackson.module" % "jackson-module-scala_2.10" % "2.6.5",
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.0" % "provided" exclude("javax.servlet", "servlet-api"),
libraryDependencies += "org.wikidata.wdtk" % "wdtk-datamodel" % "0.6.0" exclude ("com.fasterxml.jackson.core", "jackson-annotations"),
libraryDependencies += "org.apache.spark" % "spark-graphx_2.10" % "1.6.0" % "provided" ,
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.0" % "provided" ,
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.6.0" % "provided",
libraryDependencies += "org.scalatest" % "scalatest_2.10" % "2.0" % "test",
+= "com.typesafe" % "config" % "1.2.1",
libraryDependencies += "org.jsoup" % "jsoup" % "1.8.3",
libraryDependencies += "org.elasticsearch" % "elasticsearch" % "2.2.0",// exclude("com.google.guava", "guava"),
assemblyShadeRules in assembly :=Seq(
ShadeRule.rename("com.google.**" -> "googlecommona.@1").inAll
)
)
这篇关于火花与Elasticsearch TransportClient YARN confilict因为番石榴库不同版本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!