通过远程 Spark 作业出错:java.lang.IllegalAccessError:class org.apache.hadoop.hdfs.web.HftpFileSystem [英] Error through remote Spark Job: java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem

查看:464
本文介绍了通过远程 Spark 作业出错:java.lang.IllegalAccessError:class org.apache.hadoop.hdfs.web.HftpFileSystem的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Spark HDInsight 集群 (HDI 4.0) 通过 IntelliJ 运行远程 Spark 作业.在我的 Spark 应用程序中,我尝试使用内置 readStream 函数的 Spark 结构化流从 Azure blob 存储中的镶木地板文件文件夹读取输入流.

I am trying to run a remote Spark Job through IntelliJ with a Spark HDInsight cluster (HDI 4.0). In my Spark application I am trying to read an input stream from a folder of parquet files from Azure blob storage using Spark's Structured Streaming built in readStream function.

当我在连接到 HDInsight 群集的 Zeppelin 笔记本上运行代码时,该代码按预期工作.但是,当我将 Spark 应用程序部署到集群时,遇到以下错误:

The code works as expected when I run it on a Zeppelin notebook attached to the HDInsight cluster. However, when I deploy my Spark application to the cluster, I encounter the following error:

java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.web.HftpFileSystem 无法访问其超级接口 org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator

随后,我无法从 blob 存储中读取任何数据.

Subsequently, I am unable to read any data from blob storage.

我在网上查到的小资料表明这是Spark和Hadoop版本冲突导致的.该应用程序使用为 Hadoop 2.7 预构建的 Spark 2.4 运行.

The little information I found online suggested that this is caused by a version conflict between Spark and Hadoop. The application is run with Spark 2.4 prebuilt for Hadoop 2.7.

为了解决这个问题,我通过 ssh 进入集群的每个头节点和工作节点,并手动将 Hadoop 依赖项从 3.1.x 降级到 2.7.3 以匹配版本在我本地的 spark/jars 文件夹中.这样做之后,我就可以成功部署我的应用程序了.无法将集群从 HDI 4.0 降级,因为它是唯一可以支持 Spark 2.4 的集群.

To fix this, I ssh into each head and worker node of the cluster and manually downgrade the Hadoop dependencies to 2.7.3 from 3.1.x to match the version in my local spark/jars folder. After doing this , I am then able to deploy my application successfully. Downgrading the cluster from HDI 4.0 is not an option as it is the only cluster that can support Spark 2.4.

总而言之,问题可能是我使用的是为 Hadoop 2.7 预构建的 Spark 下载吗?有没有更好的方法来解决这个冲突,而不是手动降级集群节点上的 Hadoop 版本或更改我正在使用的 Spark 版本?

To summarize, could the issue be that I am using a Spark download prebuilt for Hadoop 2.7? Is there a better way to fix this conflict instead of manually downgrading the Hadoop versions on the cluster's nodes or changing the Spark version I am using?

推荐答案

在对我之前尝试过的一些方法进行故障排除后,我遇到了以下修复:

After troubleshooting some previous methods I had attempted before, I've come across the following fix:

在我的 pom.xml 中,我排除了 spark-core jar 自动导入的 hadoop-client 依赖项.此依赖项是版本 2.6.5,它与集群的 Hadoop 版本冲突.相反,我导入了我需要的版本.

In my pom.xml I excluded the hadoop-client dependency automatically imported by the spark-core jar. This dependency was version 2.6.5 which conflicted with the cluster's version of Hadoop. Instead, I import the version I require.

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version.major}</artifactId>
            <version>${spark.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.hadoop</groupId>
                    <artifactId>hadoop-client</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
</dependency>

进行此更改后,我遇到了错误java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0.进一步的研究表明,这是由于我本地机器上的 Hadoop 配置存在问题.根据 这篇文章的建议,我修改了winutils.exe我在 C://winutils/bin 下的版本是我需要的版本,并添加了相应的 hadoop.dll.进行这些更改后,我能够按预期从 Blob 存储中成功读取数据.

After making this change, I encountered the error java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0. Further research revealed this was due to a problem with the Hadoop configuration on my local machine. Per this article's advice, I modified the winutils.exe version I had under C://winutils/bin to be the version I required and also added the corresponding hadoop.dll. After making these changes, I was able to successfully read data from blob storage as expected.

TLDR问题是自动导入的 hadoop-client 依赖项,通过排除它来修复 &在 C://winutils/bin 下添加新的 winutils.exehadoop.dll.

TLDR Issue was the auto imported hadoop-client dependency which was fixed by excluding it & adding the new winutils.exe and hadoop.dll under C://winutils/bin.

这不再需要降级 HDInsight 群集中的 Hadoop 版本或更改我下载的 Spark 版本.

This no longer required downgrading the Hadoop versions within the HDInsight cluster or changing my downloaded Spark version.

这篇关于通过远程 Spark 作业出错:java.lang.IllegalAccessError:class org.apache.hadoop.hdfs.web.HftpFileSystem的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆