阿帕奇星火:导入罐 [英] Apache Spark: Importing jars

查看:195
本文介绍了阿帕奇星火:导入罐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用我的Windows机器上的Apache的火花。
我是比较新的这一点,我上传我的code到群集之前,在本地工作。

I am using Apache Spark on my windows machine. I am relatively new to this, and I am working locally before uploading my code to the cluster.

我写了一个非常简单的斯卡拉程序,一切工作正常:

I've written a very simple scala program and everything works fine:

println("creating Dataframe from json")
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val rawData = sqlContext.read.json("test_data.txt")
println("this is the test data table")
rawData.show()
println("finished running") 

该程序正确执行。我现在想添加一些处理它调用,我已经pre-打包在一个JAR文件中一些简单的Java功能。我跑阶外壳。
因为它说入门页面上,我启动与外壳:

The program executes correctly. I now want to add some processing which calls some simple Java functions that I've pre-packaged in a JAR file. I'm running the scala shell. As it says on the getting started page, I startup the shell with:

c:\Users\eshalev\Desktop\spark-1.4.1-bin-hadoop2.6\bin\spark-shell --master local[4] --jars myjar-1.0-SNAPSHOT.jar

重要的事实:我没有我的本地机器上安装Hadoop的事情。但正如我只分析一个文本文件,这不应该的问题,并没有问题,直到我用--jars。

Important fact: I don't have hadoop installed on my local machine. But as I'm only parsing a text file this shouldn't matter, and didn't matter until I used --jars.

我现在继续运行在同一阶程序。有JAR文件的引用没有......这一次我:

I now proceed to run the same scala program. There are no references to the jar file yet... This time I get:

...some SPARK debug code here and then...
    15/09/08 14:27:37 INFO Executor: Fetching http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar with timestamp 144
    1715239626
    15/09/08 14:27:37 INFO Utils: Fetching http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar-1.0 to C:\Users\eshalev\A
    ppData\Local\Temp\spark-dd9eb37f-4033-4c37-bdbf-5df309b5eace\userFiles-ebe63c02-8161-4162-9dc0-74e3df6f7356\fetchFileTem
    p2982091960655942774.tmp
    15/09/08 14:27:37 INFO Executor: Fetching http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar with timestamp 144
    1715239626
    15/09/08 14:27:37 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
    java.lang.NullPointerException
            at java.lang.ProcessBuilder.start(Unknown Source)
            at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
            at org.apache.hadoop.util.Shell.run(Shell.java:455)
            at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
            at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873)
            at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853)
            at org.apache.spark.util.Utils$.fetchFile(Utils.scala:465)
... aplenty more spark debug messages here, and then ...
this is the test data table
<console>:20: error: not found: value rawData
              rawData.show()
              ^
finished running

我双重检查 http://10.61.97.179 :62752 /罐/ myjar这一-1.0-SNAPSHOT.jar-1.0-SNAPSHOT.jar ,我可以下载它就好了。
再然后,没有在code参考罐子呢。如果启动shell没有--jar一切工作正常。

I double checked http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar-1.0-SNAPSHOT.jar, and I can download it just fine. And then again, nothing in the code references the jar yet. If start the shell without --jar everything works fine.

推荐答案

我想这另一簇的火花1.3.1和已安装的Hadoop上。它的工作完美无缺。

I tried this on another cluster which is spark 1.3.1 and has hadoop installed. It worked flawlessly.

在堆栈跟踪中提到我的单个节点设置时间的Hadoop的数量让我相信,一个实际的Hadoop的安装需要使用--jars标志。

The number of times hadoop was mentioned in the stack-trace on my single node setup leads me to believe that an actual hadoop installation is required to use the --jars flag.

另一种选择是与我的火花1.4的设置,这完美地工作,直到再有问题。

The other option being a problem with my spark 1.4 setup, which has worked flawlessly until then.

这篇关于阿帕奇星火:导入罐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆