httpclient版本和Apache Spark之间的冲突 [英] Conflict between httpclient version and Apache Spark

查看:624
本文介绍了httpclient版本和Apache Spark之间的冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Apache Spark开发Java应用程序。我使用以下版本:

I'm developing a Java application using Apache Spark. I use this version:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.2.2</version>
</dependency>

在我的代码中,存在过渡依赖项:

In my code, there is a transitional dependency:

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.2</version>
</dependency>

我将我的应用程序打包到一个JAR文件中。使用 spark-submit 在EC2实例上部署它时,出现此错误。

I package my application into a single JAR file. When deploying it on EC2 instance using spark-submit, I get this error.

Caused by: java.lang.NoSuchFieldError: INSTANCE
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:87)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:65)
    at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:58)
    at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:50)
    at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:38)

此错误清楚地表明 SparkSubmit 具有

什么是解决此问题的好方法?

What is a good way to solve this issue?

由于某种原因,我无法在Java代码上升级Spark。但是,我可以使用EC2集群轻松地做到这一点。是否可以在具有更高版本(例如1.6.1版本)的群集上部署我的Java代码?

For some reason, I cannot upgrade Spark on my Java code. However, I could do that with the EC2 cluster easily. Is it possible to deploy my java code on a cluster with a higher version say 1.6.1 version?

推荐答案

帖子中,Spark正在加载 httpclient 的旧版本。解决方案是使用Maven的 搬迁 工具来生成一个整洁,无冲突的项目。

As said in your post, Spark is loading an older version of the httpclient. The solution is to use the Maven's relocation facility to produce a neat conflict-free project.

这里是在<$ c中如何使用它的一个示例$ c> pom.xml 文件:

<project>
  <!-- Your project definition here, with the groupId, artifactId, and it's dependencies --> 
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.3</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <relocations>
                <relocation>
                  <pattern>org.apache.http.client</pattern>
                  <shadedPattern>shaded.org.apache.http.client</shadedPattern>
                </relocation>
              </relocations>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

</project>

这会将所有文件从 org.apache.http.client shaded.org.apache.http.client ,解决了冲突。

This will move all files from org.apache.http.client to shaded.org.apache.http.client, resolving the conflict.

原始帖子:

如果这仅是传递依赖关系的问题,则可以将其添加到 spark-core 依赖关系以排除Spark使用的HttpClient:

If this is simply a matter of transitive dependencies, you could just add this to your spark-core dependency to exclude the HttpClient used by Spark :

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.2.2</version>
    <scope>provided</scope>
    <exclusions>
        <exclusion>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
        </exclusion>
    </exclusions>
</dependency>

我还添加了范围作为已提供,因为它将由您的集群提供。

I also added the scope as provided in your dependency as it will be provided by your cluster.

但是,这可能会影响Spark的内部行为。如果执行此操作后仍然出现错误,则可以尝试使用Maven的 搬迁 工具应该可以产生一个整洁的,无冲突的项目。

However, that might muck around with Spark's internal behaviour. If you still get an error after doing this, you could try using Maven's relocation facility that should produce a neat conflict-free project.

关于您的事实无法升级Spark的版本,是否完全使用了此依赖项来自mvnrepository的声明?

Regarding the fact you can't upgrade Spark's version, did you use exactly this dependency declaration from mvnrepository ?

Spark向后兼容,因此,在具有更高版本的集群上部署您的作业应该没有任何问题。

Spark being backwards compatible, there shouldn't be any problem deploying your job on a cluster with a higher version.

这篇关于httpclient版本和Apache Spark之间的冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆