httpclient 版本和 Apache Spark 之间的冲突 [英] Conflict between httpclient version and Apache Spark
问题描述
我正在使用 Apache Spark 开发 Java 应用程序.我用这个版本:
I'm developing a Java application using Apache Spark. I use this version:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.2</version>
</dependency>
在我的代码中,有一个过渡依赖:
In my code, there is a transitional dependency:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.2</version>
</dependency>
我将我的应用程序打包到一个 JAR 文件中.使用 spark-submit
在 EC2 实例上部署它时,我收到此错误.
I package my application into a single JAR file. When deploying it on EC2 instance using spark-submit
, I get this error.
Caused by: java.lang.NoSuchFieldError: INSTANCE
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144)
at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:87)
at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:65)
at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:58)
at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:50)
at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:38)
此错误清楚地表明 SparkSubmit
已加载同一 Apache httpclient 库的旧版本,因此发生此冲突.
This error shows clearly that SparkSubmit
has loaded an older version of the same Apache httpclient library and this conflict happens for this reason.
有什么好的方法可以解决这个问题?
What is a good way to solve this issue?
出于某种原因,我无法在我的 Java 代码上升级 Spark.但是,我可以使用 EC2 集群轻松做到这一点.是否可以将我的 Java 代码部署在具有更高版本(例如 1.6.1 版本)的集群上?
For some reason, I cannot upgrade Spark on my Java code. However, I could do that with the EC2 cluster easily. Is it possible to deploy my java code on a cluster with a higher version say 1.6.1 version?
推荐答案
正如您在帖子中所说,Spark 正在加载 httpclient
的旧版本.解决办法是使用Maven的relocation代码>
生成一个整洁的无冲突项目的工具.
As said in your post, Spark is loading an older version of the httpclient
. The solution is to use the Maven's relocation
facility to produce a neat conflict-free project.
以下是如何在 pom.xml
文件中使用它的示例:
Here's an example of how to use it in your pom.xml
file :
<project>
<!-- Your project definition here, with the groupId, artifactId, and it's dependencies -->
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>org.apache.http.client</pattern>
<shadedPattern>shaded.org.apache.http.client</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
这会将所有文件从 org.apache.http.client
移动到 shaded.org.apache.http.client
,解决冲突.
This will move all files from org.apache.http.client
to shaded.org.apache.http.client
, resolving the conflict.
原帖:
如果这只是传递依赖的问题,您可以将其添加到您的 spark-core
依赖中以排除 Spark 使用的 HttpClient :
If this is simply a matter of transitive dependencies, you could just add this to your spark-core
dependency to exclude the HttpClient used by Spark :
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.2</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</exclusion>
</exclusions>
</dependency>
我还在您的依赖项中添加了 scope
作为 provided
,因为它将由您的集群提供.
I also added the scope
as provided
in your dependency as it will be provided by your cluster.
然而,这可能与 Spark 的内部行为有关.如果执行此操作后仍然出现错误,您可以尝试使用 Maven 的 relocation
应该产生一个整洁的无冲突项目的设施.
However, that might muck around with Spark's internal behaviour. If you still get an error after doing this, you could try using Maven's relocation
facility that should produce a neat conflict-free project.
关于无法升级 Spark 版本的事实,您是否完全使用 这个依赖 来自 mvnrepository 的声明?
Regarding the fact you can't upgrade Spark's version, did you use exactly this dependency declaration from mvnrepository ?
Spark 向后兼容,在具有更高版本的集群上部署您的作业应该没有任何问题.
Spark being backwards compatible, there shouldn't be any problem deploying your job on a cluster with a higher version.
这篇关于httpclient 版本和 Apache Spark 之间的冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!