执行 spark-shell 时出现 NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream [英] NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream when execute spark-shell

查看:58
本文介绍了执行 spark-shell 时出现 NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经下载了没有 hadoop 的 spark 1.4.0 预构建版本(使用用户提供的 Haddop).当我运行 spark-shell 命令时,出现此错误:

I've downloaded the prebuild version of spark 1.4.0 without hadoop (with user-provided Haddop). When I ran the spark-shell command, I got this error:

> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/
FSDataInputStream
        at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa
rkProperties$1.apply(SparkSubmitArguments.scala:111)
        at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa
rkProperties$1.apply(SparkSubmitArguments.scala:111)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkPropert
ies(SparkSubmitArguments.scala:111)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArgume
nts.scala:97)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:106)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStr
eam
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 7 more

我在网上查了一下,据说spark-env.cmd中还没有设置HADOOP_HOME.但是我在 spark 安装文件夹中找不到 spark-env.cmd .我已经跟踪了 spark-shell 命令,似乎那里没有 HADOOP_CONFIG.我尝试在环境变量上添加 HADOOP_HOME,但它仍然给出相同的异常.

I've searched on Internet, it is said that HADOOP_HOME has not been set yet in spark-env.cmd. But I cannot find spark-env.cmd in the spark installation folder. I've traced the spark-shell command and it seems that there are no HADOOP_CONFIG in there. I've tried to add the HADOOP_HOME on environment variable but it still give the same exception.

实际上我并没有真正使用hadoop.我下载了 hadoop 作为解决方法,如 this问题

Actually I don't really using the hadoop. I downloaded hadoop as a workaround as suggested in this question

我使用的是 Windows 8 和 Scala 2.10.

I am using windows 8 and scala 2.10.

任何帮助将不胜感激.谢谢.

Any help will be appreciated. Thanks.

推荐答案

Spark 构建名称中的没有 Hadoop"具有误导性:这意味着构建没有绑定到特定的 Hadoop 发行版,而不是为了运行没有它:用户应该指出在哪里可以找到 Hadoop(参见 https://spark.apache.org/docs/latest/hadoop-provided.html)

The "without Hadoop" in the Spark's build name is misleading: it means the build is not tied to a specific Hadoop distribution, not that it is meant to run without it: the user should indicate where to find Hadoop (see https://spark.apache.org/docs/latest/hadoop-provided.html)

解决此问题的一种简洁方法是:

One clean way to fix this issue is to:

  1. 获取 Hadoop Windows 二进制文件.理想情况下构建它们,但这很痛苦(有关一些提示,请参阅:Windows 上的 Hadoop 构建/安装错误).否则谷歌一些,例如目前你可以从这里下载 2.6.0:http://www.barik.net/archive/2015/01/19/172716/
  2. 创建一个看起来像这样的 spark-env.cmd 文件(修改 Hadoop 路径以匹配您的安装):<代码>@回声关闭设置 HADOOP_HOME=D:\Utils\hadoop-2.7.1设置 PATH=%HADOOP_HOME%\bin;%PATH%set SPARK_DIST_CLASSPATH=<在此处粘贴%HADOOP_HOME%\bin\hadoop classpath的输出>
  3. 将此 spark-env.cmd 放在与 Spark 基本文件夹位于同一级别的 conf 文件夹中(这可能看起来很奇怪),或者放在一个文件夹中由 SPARK_CONF_DIR 环境变量指示.
  1. Obtain Hadoop Windows binaries. Ideally build them, but this is painful (for some hints see: Hadoop on Windows Building/ Installation Error). Otherwise Google some up, for instance currently you can download 2.6.0 from here: http://www.barik.net/archive/2015/01/19/172716/
  2. Create a spark-env.cmd file looking like this (modify Hadoop path to match your installation): @echo off set HADOOP_HOME=D:\Utils\hadoop-2.7.1 set PATH=%HADOOP_HOME%\bin;%PATH% set SPARK_DIST_CLASSPATH=<paste here the output of %HADOOP_HOME%\bin\hadoop classpath>
  3. Put this spark-env.cmd either in a conf folder located at the same level as your Spark base folder (which may look weird), or in a folder indicated by the SPARK_CONF_DIR environment variable.

这篇关于执行 spark-shell 时出现 NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆