S3A文件系统上的Spark历史记录服务器:ClassNotFoundException [英] Spark History Server on S3A FileSystem: ClassNotFoundException

查看:63
本文介绍了S3A文件系统上的Spark历史记录服务器:ClassNotFoundException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Spark可以使用Hadoop S3A文件系统 org.apache.hadoop.fs.s3a.S3AFileSystem .通过将以下内容添加到 conf/spark-defaults.conf 中,我可以获取spark-shell来登录到S3存储桶:

Spark can use Hadoop S3A file system org.apache.hadoop.fs.s3a.S3AFileSystem. By adding the following into the conf/spark-defaults.conf, I can get spark-shell to log to the S3 bucket:

spark.jars.packages               net.java.dev.jets3t:jets3t:0.9.0,com.google.guava:guava:16.0.1,com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3
spark.hadoop.fs.s3a.impl          org.apache.hadoop.fs.s3a.S3AFileSystem
spark.eventLog.enabled            true
spark.eventLog.dir                s3a://spark-logs-test/
spark.history.fs.logDirectory     s3a://spark-logs-test/
spark.history.provider            org.apache.hadoop.fs.s3a.S3AFileSystem

Spark History Server还从 conf/spark-defaults.conf 加载配置,但似乎没有加载 spark.jars.packages 配置,并抛出ClassNotFoundException :

Spark History Server also loads configuration from conf/spark-defaults.conf, but it seems not to load spark.jars.packages configuration, and throws ClassNotFoundException:

Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
    at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:256)
    at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)

用于加载配置的Spark源代码在

The Spark source code for loading configuration is different in SparkSubmitArguments.scala and in HistoryServerArguments.scala, in particular the HistoryServerArguments does not seem to load packages.

是否可以将 org.apache.hadoop.fs.s3a.S3AFileSystem 依赖项添加到History Server?

Is there a way to add the org.apache.hadoop.fs.s3a.S3AFileSystem dependency to the History Server?

推荐答案

又做了一些挖掘并弄清楚了.这是错误的地方:

Did some more digging and figured it out. Here's what was wrong:

  1. 可以将S3A必需的JAR添加到 $ SPARK_HOME/jars 中(如

    spark.history.provider     org.apache.hadoop.fs.s3a.S3AFileSystem
    

    $ SPARK_HOME/conf/spark-defaults.conf 中将导致

    Exception in thread "main" java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(org.apache.spark.SparkConf)
    

    例外.可以按照此答案中的建议安全删除该行.

    exception. That line can be safely removed as suggested in this answer.

    总结:

    我在 $ SPARK_HOME/jars 中添加了以下JAR:

    I added the following JARs to $SPARK_HOME/jars:

    • jets3t-0.9.3.jar(预构建的Spark二进制文件可能已经存在,似乎与0.9.x版本无关)
    • guava-14.0.1.jar(您的预构建Spark二进制文件可能已经存在,似乎与14.0.x版本无关)
    • aws-java-sdk-1.7.4.jar(必须为1.7.4)
    • hadoop-aws.jar(版本2.7.3)(可能应与您的Spark版本中的Hadoop版本匹配)

    并将此行添加到 $ SPARK_HOME/conf/spark-defaults.conf

    spark.history.fs.logDirectory     s3a://spark-logs-test/
    

    首先,您需要一些其他配置才能启用日志记录,但是一旦S3存储桶具有日志,这便是History Server唯一需要的配置.

    You'll need some other configuration to enable logging in the first place, but once the S3 bucket has the logs, this is the only configuration that is needed for the History Server.

    这篇关于S3A文件系统上的Spark历史记录服务器:ClassNotFoundException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆