星火：如何使用SparkContext.textFile本地文件系统 [英] Spark: how to use SparkContext.textFile for local file system

查看：1196 发布时间：2016/5/22 15:38:27 apache-spark

本文介绍了星火：如何使用SparkContext.textFile本地文件系统的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚开始使用Apache星火启动（Scala中，但语言是无关紧要的）。我使用的独立模式，我会要处理从本地文件系统的文本文件（所以没有像分布HDFS）。

I'm just getting started using Apache Spark (in Scala, but the language is irrelevant). I'm using standalone mode and I'll want to process a text file from a local file system (so nothing distributed like HDFS).

按照文本文件方法从 SparkContext 的文件，它将

According to the documentation of the textFile method from SparkContext, it will

阅读从HDFS一个文本文件，本地文件系统（所有可用
的节点），或任何Hadoop的支持的文件系统的URI，并返回它作为一个
RDD字符串。

Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings.

什么是不清楚对我来说，如果整个文本文件可以只被复制到所有节点，或者如果输入数据应该已经被分配，例如如果使用4个节点和1000线的CSV文件，必须在每个节点上250行。

What is unclear for me is if the whole text file can just be copied to all the nodes, or if the input data should already be partitioned, e.g. if using 4 nodes and a csv file with 1000 lines, have 250 lines on each node.

我怀疑每个节点应该有整个文件，但我不知道。

I suspect each node should have the whole file but I'm not sure.

星火：如何使用SparkContext.textFile本地文件系统 [英] Spark: how to use SparkContext.textFile for local file system

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

星火：如何使用SparkContext.textFile本地文件系统 [英] Spark: how to use SparkContext.textFile for local file system

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭