火花与Elasticsearch TransportClient YARN confilict因为番石榴库不同版本 [英] Spark on YARN confilict with Elasticsearch TransportClient because of Guava library different versions

查看:888
本文介绍了火花与Elasticsearch TransportClient YARN confilict因为番石榴库不同版本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想运行一个谷歌的云虚拟机集群和地图操作,我需要做弹性的搜索查询内部火花的工作。我的问题是,Spark和弹性搜索对番石榴库冲突,因为Spark是利用番石榴14和ES番石榴18

I want to run a Spark-job on a Google Cloud VM cluster and inside a map operation I need to make a query on elastic search. My problem is that Spark and Elastic Search have a conflict on the Guava library, as Spark is using Guava 14 and ES Guava 18.

我的问题是这样的方法调用
com.google.common.util.concurrent.MoreExecutors.directExecutor(),它存在于番石榴18,但不是在番石榴14

My problem is this method call com.google.common.util.concurrent.MoreExecutors.directExecutor(), which exists in Guava 18, but not in Guava 14.

在更详细的我试图做的工作就是像下面这样。

In more detail the job I am trying to do is something like the following.

 input.map(record=>{
    val client=openConnection()
    val newdata=client.query(record.someInfo)
      new record(newdata)
})

该方法的openConnection 如下所示

 public static TransportClient openConnection(String ipAddress, int ipPort) throws UnknownHostException {


    Settings settings = Settings.settingsBuilder().put("cluster.name", "elasticsearch").build();
    TransportClient client = TransportClient.builder().settings(settings).build().
            addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(ipAddress), ipPort));

    return client;

}

我曾尝试使用着色强制ES由SBT文件中添加着色规则,使用番石榴18如下:

I have tried to use shading to force ES to use Guava 18 by adding a shading rule in the sbt file as follows:

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-graphx_2.10" % "1.6.0" % "provided"  ,
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.0" % "provided" ,
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.6.0" % "

 libraryDependencies += "org.elasticsearch" % "elasticsearch" % "2.2.0",

 assemblyShadeRules in assembly :=Seq(
  ShadeRule.rename("com.google.common.*" -> "googlecommona.@1").
    inLibrary("org.elasticsearch" % "elasticsearch" % "2.2.0"))

的问题然而,似乎仍然存在。
有没有办法解决这个confict的方法吗?

The problem however seems to remain. Is there a way to resolve this confict?

推荐答案

底纹那人回答:我加了 build.sbt 文件以下规则

Shading was the answer: I added the following rule in the build.sbt file.

下面的解决方案,适用于使用该ElasticSearch火花集群在YARN TransportClient 类。

The solution below, works for a SPARK-cluster over YARN that uses the ElasticSearch TransportClient class.

  assemblyShadeRules in assembly :=Seq(
      ShadeRule.rename("com.google.**" -> "googlecommona.@1").inAll
  )

我附上整个SBT文件的完整性:

I attach the whole sbt file for completeness:

import sbt.ExclusionRule
import sbt.Keys._

lazy val root = (project in file(".")).
  settings(
  name := "scala_code",
  version := "1.0",
  scalaVersion := "2.10.6",
  conflictManager := ConflictManager.latestRevision,
  test in assembly := {},
  assemblyMergeStrategy in assembly := {
      case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
      case _ => MergeStrategy.first
  },

  parallelExecution in test := false,
  libraryDependencies += "com.fasterxml.jackson.module" % "jackson-module-scala_2.10" % "2.6.5",
  libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.0" % "provided"  exclude("javax.servlet", "servlet-api"),
  libraryDependencies += "org.wikidata.wdtk" % "wdtk-datamodel" % "0.6.0" exclude ("com.fasterxml.jackson.core",  "jackson-annotations"),
  libraryDependencies += "org.apache.spark" % "spark-graphx_2.10" % "1.6.0" % "provided"  ,
  libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.0" % "provided" ,
  libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.6.0" % "provided",
  libraryDependencies += "org.scalatest" % "scalatest_2.10" % "2.0" % "test",
 += "com.typesafe" % "config" % "1.2.1",
  libraryDependencies += "org.jsoup" % "jsoup" % "1.8.3",
  libraryDependencies += "org.elasticsearch" % "elasticsearch" % "2.2.0",// exclude("com.google.guava", "guava"),

  assemblyShadeRules in assembly :=Seq(
      ShadeRule.rename("com.google.**" -> "googlecommona.@1").inAll
  )

)

这篇关于火花与Elasticsearch TransportClient YARN confilict因为番石榴库不同版本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆