Spark Standalone集群无法读取本地文件系统中的文件 [英] Spark Standalone cluster cannot read the files in local filesystem

查看：77 发布时间：2021/4/8 20:21:04 apache-spark rdd

本文介绍了Spark Standalone集群无法读取本地文件系统中的文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个Spark独立集群，其中有2个工作节点和1个主节点.

I have a Spark standalone cluster having 2 worker nodes and 1 master node.

使用spark-shell，我能够从本地文件系统上的文件中读取数据，然后进行了一些转换并将最终的RDD保存在/home/output中(假设)RDD已成功保存，但仅在一个工作节点和主节点上存在_SUCCESS文件.

Using spark-shell, I was able to read data from a file on local filesystem, then did some transformations and saved the final RDD in /home/output(let's say) The RDD got saved successfully but only on one worker node and on master node only _SUCCESS file was there.

现在，如果我想从/home/output读取此输出数据，我将不会获取任何数据，因为它正在master上获取0数据，然后我假设它没有为此检查其他工作程序节点.

Now, if I want to read this output data from /home/output, I am not getting any data as it is getting 0 data on master and then I am assuming that it is not checking the other worker nodes for that.

如果有人可以阐明为什么Spark无法从所有工作节点读取数据，或者Spark从工作节点读取数据的机制是什么，那就太好了.

It would be great if someone can throw some light on why Spark is not reading from all the worker nodes or what is the mechanism which Spark uses to read the data from worker nodes.

scala> sc.textFile("/home/output/")
res7: org.apache.spark.rdd.RDD[(String, String)] = /home/output/ MapPartitionsRDD[5] at wholeTextFiles at <console>:25

scala> res7.count
res8: Long = 0

Spark Standalone集群无法读取本地文件系统中的文件 [英] Spark Standalone cluster cannot read the files in local filesystem

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark Standalone集群无法读取本地文件系统中的文件 [英] Spark Standalone cluster cannot read the files in local filesystem

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭