如何读取多个文本文件合并为一个RDD？ [英] How to read multiple text files into a single RDD?

查看：1585 发布时间：2016/5/19 22:16:11 apache apache-spark

本文介绍了如何读取多个文本文件合并为一个RDD？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从一个HDFS位置读取一堆文本文件，并使用Spark在迭代上执行映射。

I want to read a bunch of text files from a hdfs location and perform mapping on it in an iteration using spark.

JavaRDD＆LT;串GT;记录= ctx.textFile（参数[1]，1）; 能够在同一时间只读取一个文件

JavaRDD<String> records = ctx.textFile(args[1], 1); is capable of reading only one file at a time.

我想读多个文件，并处理它们作为单个RDD。怎么样？

I want to read more than one file and process them as a single RDD. How?

推荐答案

您可以指定整个目录，使用通配符甚至目录和通配符的CSV。例如：

You can specify whole directories, use wildcards and even CSV of directories and wildcards. E.g.:

sc.textFile("/my/dir1,/my/paths/part-00[0-5]*,/another/dir,/a/specific/file")

由于尼克Chammas指出，这是Hadoop的的曝光<一href=\"http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/ma$p$pduce/lib/input/FileInputFormat.html\"><$c$c>FileInputFormat因此，这也适用于使用Hadoop（烫）。

As Nick Chammas points out this is an exposure of Hadoop's FileInputFormat and therefore this also works with Hadoop (and Scalding).

这篇关于如何读取多个文本文件合并为一个RDD？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何读取多个文本文件合并为一个RDD？ [英] How to read multiple text files into a single RDD?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

如何读取多个文本文件合并为一个RDD？ [英] How to read multiple text files into a single RDD?

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭