Spark Pipe函数不会抛出此类文件或目录 [英] Spark Pipe function throws No such file or directory

查看:61
本文介绍了Spark Pipe函数不会抛出此类文件或目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在REPL的EMR主服务器上运行火花管道功能,只是为了测试管道功能.我正在使用以下示例

I am running the spark pipe function on EMR master server in REPL just to test out the pipe functionality. I am using the following examples

https://stackoverflow.com/a/32978183/8876462
http://blog.madhukaraphatak.com/pipe-in-spark/
http://hadoop-makeitsimple.blogspot.com/2016/05/pipe-in-spark.html

这是我的代码::

import org.apache.spark._
val distScript = "/home/hadoop/PipeEx.sh"
val distScriptName = "PipeEx.sh"
sc.addFile(distScript)
val ipData = 
sc.parallelize(List("asd","xyz","zxcz","sdfsfd","Ssdfd","Sdfsf"))
val opData = ipData.pipe(SparkFiles.get(distScriptName))
opData.foreach(println)

我尝试了不同的操作,例如使文件可执行,如另一篇文章中所述,放在/usr/lib/spark/bin中的文件中.我将distScript更改为

I have tried different things like making the file executable, placed in file in /usr/lib/spark/bin as suggested in another post. I changed the distScript to say

"file:///home/hadoop/PipeEx.sh"

tmp/spark */userFiles * 位置,我始终找不到此类文件或目录.我试图从tmp位置访问并运行Shell程序,它运行正常.我的Shell脚本与 http://blog.madhukaraphatak.com/pipe-in​​-火花/

I always get no such file or directory in tmp/spark*/userFiles* location. I have tried to access and run the shell program from the tmp location and it runs fine. My shell script is the same as http://blog.madhukaraphatak.com/pipe-in-spark/

这是日志的第一部分:

[Stage 9:>                                                          (0 + 2) 
/ 2]18/03/19 19:58:22 WARN TaskSetManager: Lost task 1.0 in stage 9.0 (TID 
72, ip-172-31-42-11.ec2.internal, executor 9): java.io.IOException: Cannot 
run program "/mnt/tmp/spark-bdd582ec-a5ac-4bb1-874e-832cd5427b18/userFiles-
497f6051-6f49-4268-b9c5-a28c2ad5edc6/PipeEx.sh": error=2, No such file or 
directory

有人有什么主意吗?我正在使用Spark 2.2.1和Scala 2.11.8

Does any one have any idea? I am using Spark 2.2.1 and scala 2.11.8

谢谢

推荐答案

删除了SparkFiles.get(distScriptName)命令.所以我的最终代码看起来像这样

I was able to solve this , once I removed the SparkFiles.get(distScriptName) command. So my final code looks like this

val distScript = "/home/hadoop/PipeEx.sh"
val distScriptName = "./PipeEx.sh"
sc.addFile(distScript)

val ipData = sc.parallelize(List("asd","xyz","zxcz","sdfsfd","Ssdfd","Sdfsf"))
val opData = ipData.pipe(distScriptName)
opData.collect().foreach(println)

我不太确定为什么删除SparkFiles.get()可以解决问题

I am not very sure why removing the SparkFiles.get() solved the problem

这篇关于Spark Pipe函数不会抛出此类文件或目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆