S3A文件系统上的Spark历史记录服务器:ClassNotFoundException [英] Spark History Server on S3A FileSystem: ClassNotFoundException
问题描述
Spark可以使用Hadoop S3A文件系统 org.apache.hadoop.fs.s3a.S3AFileSystem
.通过将以下内容添加到 conf/spark-defaults.conf
中,我可以获取spark-shell来登录到S3存储桶:
Spark can use Hadoop S3A file system org.apache.hadoop.fs.s3a.S3AFileSystem
. By adding the following into the conf/spark-defaults.conf
, I can get spark-shell to log to the S3 bucket:
spark.jars.packages net.java.dev.jets3t:jets3t:0.9.0,com.google.guava:guava:16.0.1,com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.eventLog.enabled true
spark.eventLog.dir s3a://spark-logs-test/
spark.history.fs.logDirectory s3a://spark-logs-test/
spark.history.provider org.apache.hadoop.fs.s3a.S3AFileSystem
Spark History Server还从 conf/spark-defaults.conf
加载配置,但似乎没有加载 spark.jars.packages
配置,并抛出ClassNotFoundException
:
Spark History Server also loads configuration from conf/spark-defaults.conf
, but it seems not to load spark.jars.packages
configuration, and throws ClassNotFoundException
:
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:256)
at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
The Spark source code for loading configuration is different in SparkSubmitArguments.scala and in HistoryServerArguments.scala, in particular the HistoryServerArguments does not seem to load packages.
是否可以将 org.apache.hadoop.fs.s3a.S3AFileSystem
依赖项添加到History Server?
Is there a way to add the org.apache.hadoop.fs.s3a.S3AFileSystem
dependency to the History Server?
推荐答案
又做了一些挖掘并弄清楚了.这是错误的地方:
Did some more digging and figured it out. Here's what was wrong:
-
可以将S3A必需的JAR添加到
$ SPARK_HOME/jars
中(如行spark.history.provider org.apache.hadoop.fs.s3a.S3AFileSystem
在
$ SPARK_HOME/conf/spark-defaults.conf
中将导致Exception in thread "main" java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(org.apache.spark.SparkConf)
例外.可以按照此答案中的建议安全删除该行.
exception. That line can be safely removed as suggested in this answer.
总结:
我在
$ SPARK_HOME/jars
中添加了以下JAR:I added the following JARs to
$SPARK_HOME/jars
:- jets3t-0.9.3.jar(预构建的Spark二进制文件可能已经存在,似乎与0.9.x版本无关)
- guava-14.0.1.jar(您的预构建Spark二进制文件可能已经存在,似乎与14.0.x版本无关)
- aws-java-sdk-1.7.4.jar(必须为1.7.4)
- hadoop-aws.jar(版本2.7.3)(可能应与您的Spark版本中的Hadoop版本匹配)
并将此行添加到
$ SPARK_HOME/conf/spark-defaults.conf
spark.history.fs.logDirectory s3a://spark-logs-test/
首先,您需要一些其他配置才能启用日志记录,但是一旦S3存储桶具有日志,这便是History Server唯一需要的配置.
You'll need some other configuration to enable logging in the first place, but once the S3 bucket has the logs, this is the only configuration that is needed for the History Server.
这篇关于S3A文件系统上的Spark历史记录服务器:ClassNotFoundException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!