httpclient 版本和 Apache Spark 之间的冲突 [英] Conflict between httpclient version and Apache Spark

查看:35
本文介绍了httpclient 版本和 Apache Spark 之间的冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Apache Spark 开发 Java 应用程序.我用这个版本:

I'm developing a Java application using Apache Spark. I use this version:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.2.2</version>
</dependency>

在我的代码中,有一个过渡依赖:

In my code, there is a transitional dependency:

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.2</version>
</dependency>

我将我的应用程序打包到一个 JAR 文件中.使用 spark-submit 在 EC2 实例上部署它时,我收到此错误.

I package my application into a single JAR file. When deploying it on EC2 instance using spark-submit, I get this error.

Caused by: java.lang.NoSuchFieldError: INSTANCE
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:87)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:65)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:58)
    at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:50)
    at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:38)

此错误清楚地表明 SparkSubmit 已加载同一 Apache httpclient 库的旧版本,因此发生此冲突.

This error shows clearly that SparkSubmit has loaded an older version of the same Apache httpclient library and this conflict happens for this reason.

有什么好的方法可以解决这个问题?

What is a good way to solve this issue?

出于某种原因,我无法在我的 Java 代码上升级 Spark.但是,我可以使用 EC2 集群轻松做到这一点.是否可以将我的 Java 代码部署在具有更高版本(例如 1.6.1 版本)的集群上?

For some reason, I cannot upgrade Spark on my Java code. However, I could do that with the EC2 cluster easily. Is it possible to deploy my java code on a cluster with a higher version say 1.6.1 version?

推荐答案

正如您在帖子中所说,Spark 正在加载 httpclient 的旧版本.解决办法是使用Maven的relocation 生成一个整洁的无冲突项目的工具.

As said in your post, Spark is loading an older version of the httpclient. The solution is to use the Maven's relocation facility to produce a neat conflict-free project.

以下是如何在 pom.xml 文件中使用它的示例:

Here's an example of how to use it in your pom.xml file :

<project>
  <!-- Your project definition here, with the groupId, artifactId, and it's dependencies --> 
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.3</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <relocations>
                <relocation>
                  <pattern>org.apache.http.client</pattern>
                  <shadedPattern>shaded.org.apache.http.client</shadedPattern>
                </relocation>
              </relocations>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

</project>

这会将所有文件从 org.apache.http.client 移动到 shaded.org.apache.http.client,解决冲突.

This will move all files from org.apache.http.client to shaded.org.apache.http.client, resolving the conflict.

原帖:

如果这只是传递依赖的问题,您可以将其添加到您的 spark-core 依赖中以排除 Spark 使用的 HttpClient :

If this is simply a matter of transitive dependencies, you could just add this to your spark-core dependency to exclude the HttpClient used by Spark :

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.2.2</version>
    <scope>provided</scope>
    <exclusions>
        <exclusion>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
        </exclusion>
    </exclusions>
</dependency>

我还在您的依赖项中添加了 scope 作为 provided ,因为它将由您的集群提供.

I also added the scope as provided in your dependency as it will be provided by your cluster.

然而,这可能与 Spark 的内部行为有关.如果执行此操作后仍然出现错误,您可以尝试使用 Maven 的 relocation 应该产生一个整洁的无冲突项目的设施.

However, that might muck around with Spark's internal behaviour. If you still get an error after doing this, you could try using Maven's relocation facility that should produce a neat conflict-free project.

关于无法升级 Spark 版本的事实,您是否完全使用 这个依赖 来自 mvnrepository 的声明?

Regarding the fact you can't upgrade Spark's version, did you use exactly this dependency declaration from mvnrepository ?

Spark 向后兼容,在具有更高版本的集群上部署您的作业应该没有任何问题.

Spark being backwards compatible, there shouldn't be any problem deploying your job on a cluster with a higher version.

这篇关于httpclient 版本和 Apache Spark 之间的冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆