数据访问星火EC2 [英] Data access Spark EC2

查看：200 发布时间：2015/12/1 10:31:08 scala amazon-ec2 apache-spark

本文介绍了数据访问星火EC2的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

下面的指令通过EC2脚本安装集群后，我不能够正确地启动我的.jar，因为他们没有找到我穿上/根/持久HDFS /在主机和从机节点的数据文件。我读的其他职位，我需要preFIX与文件中的文件位置：//，但它不会改变任何东西......我有这样的错误：

After following instruction to install cluster via ec2 script, i'm not able to correctly launch my .jar because they don't find the data file which i put on /root/persistent-hdfs/ on the master and slave nodes. I read on an other post that i need to prefix the file location with file:// but it doesn't change anything... I have this error :

在线程异常主要org.apache.hadoop.ma pred.InvalidInputException：输入路径不存在：文件：//root/persistent-hdfs/data/ds_1.csv

Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file://root/persistent-hdfs/data/ds_1.csv

要推出我使用的主节点上的./bin/spark-submit工作，对吗？

To launch the job i used the ./bin/spark-submit on the master node, am i correct ?

感谢您提前为您的支持。

Thank you in advance for your support.

推荐答案

有几件事情你需要做的：

There are a few things you need to do:

在默认配置使用的临时HDFS，所以你需要把它们关掉 $ /root/ephemeral-hdfs/bin/stop-all.sh 并开启持续对 $ /root/persistent-hdfs/bin/start-all.sh 。
将你的文件转化为简单的持久性HDFS的根目录 $ /根/持久HDFS /斌/ Hadoop的FS -put /root/ds_1.csv /ds_1.csv。现在，检查一下它实际上是有 $ /根/持久HDFS /斌/ Hadoop的FS -ls 。


最后，在 /root/spark/conf/spark-defaults.conf 和 /根/火花/ conf目录/火花编辑星火的配置文件-env.sh 和改变一切，说短暂的持久性。



The default configuration uses the ephemeral hdfs so you need to turn that off $ /root/ephemeral-hdfs/bin/stop-all.sh and turn persistent on $ /root/persistent-hdfs/bin/start-all.sh.
Put your file into the persistent hdfs root directory for simplicity $ /root/persistent-hdfs/bin/hadoop fs -put /root/ds_1.csv /ds_1.csv. Now check to see it is actually there $ /root/persistent-hdfs/bin/hadoop fs -ls. 
Finally, edit Spark's configuration files in /root/spark/conf/spark-defaults.conf and /root/spark/conf/spark-env.sh and change everything that says ephemeral to persistent.

假设你把你的CSV的执着HDFS的根目录下（如我们在步骤2中所做的那样），你可以使用 VAL RAWDATA = sc.textFile（/ ds_1.csv访问它的火花）。
Assuming you put your csv in the root directory of the persistent hdfs (as we did in step 2) you can access it in spark using val rawData = sc.textFile("/ds_1.csv").
玩得开心！

                        这篇关于数据访问星火EC2的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

数据访问星火EC2 [英] Data access Spark EC2

问题描述

推荐答案

相关文章

云存储最新文章

热门教程

热门工具

登录关闭

数据访问星火EC2 [英] Data access Spark EC2

问题描述

推荐答案

相关文章

云存储最新文章

热门教程

热门工具

登录 关闭

登录关闭