NoClassDefFoundError:使用Spark读取s3数据时出现org/apache/hadoop/fs/StreamCapabilities [英] NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities while reading s3 Data with spark

查看:265
本文介绍了NoClassDefFoundError:使用Spark读取s3数据时出现org/apache/hadoop/fs/StreamCapabilities的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在本地开发机上运行一个简单的Spark作业(通过Intellij),该作业从Amazon s3中读取数据.

I would like to run a simple spark job on my local dev machine (through Intellij) reading data from Amazon s3.

我的 build.sbt 文件:

scalaVersion := "2.11.12"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.3.1",
  "org.apache.spark" %% "spark-sql" % "2.3.1",
  "com.amazonaws" % "aws-java-sdk" % "1.11.407",
  "org.apache.hadoop" % "hadoop-aws" % "3.1.1"
)

我的代码段:

val spark = SparkSession
    .builder
    .appName("test")
    .master("local[2]")
    .getOrCreate()

  spark
    .sparkContext
    .hadoopConfiguration
    .set("fs.s3n.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem")

  val schema_p = ...

  val df = spark
    .read
    .schema(schema_p)
    .parquet("s3a:///...")

然后出现以下异常:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2093)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2058)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2152)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2580)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:45)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:622)
    at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:606)
    at Test$.delayedEndpoint$Test$1(Test.scala:27)
    at Test$delayedInit$body.apply(Test.scala:4)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    at scala.App$class.main(App.scala:76)
    at Test$.main(Test.scala:4)
    at Test.main(Test.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.StreamCapabilities
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 41 more

s3a:///替换为s3:///时,出现另一个错误:No FileSystem for scheme: s3

When replacing s3a:/// to s3:/// I get another error: No FileSystem for scheme: s3

由于我是AWS的新手,所以我不知道应该使用s3:///s3a:///还是s3n:///.我已经使用aws-cli设置了我的AWS凭证.

As I am new to AWS, I do not know if I should user s3:///, s3a:/// or s3n:///. I have already setup my AWS credentials with aws-cli.

我的机器上没有任何Spark安装.

I have not any Spark installation on my machine.

预先感谢您的帮助

推荐答案

我将从查看

请勿尝试插入"比使用任何问题而构建的Hadoop版本更高的AWS开发工具包版本,更改AWS开发工具包版本将无法解决问题,仅会更改您看到的堆栈跟踪.

Do not attempt to "drop in" a newer version of the AWS SDK than that which the Hadoop version was built with Whatever problem you have, changing the AWS SDK version will not fix things, only change the stack traces you see.

无论您在本地Spark安装上安装了什么版本的hadoop- JAR,都需要完全相同的hadoop-aws版本,以及与hadoop-完全相同的aws SDK版本. aws建有.有关详细信息,请尝试 mvnrepository .

whatever version of the hadoop- JARs you have on your local spark installation, you need to have exactly the same version of hadoop-aws, and exactly the same version of the aws SDK which hadoop-aws was built with. Try mvnrepository for the details.

这篇关于NoClassDefFoundError:使用Spark读取s3数据时出现org/apache/hadoop/fs/StreamCapabilities的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆