星火：如何从RDD不同势组值写入到不同的文件？ [英] Spark: How to write diffrent group values from RDD to different files?

查看：151 发布时间：2016/5/22 16:29:06 scala apache-spark

本文介绍了星火：如何从RDD不同势组值写入到不同的文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要与关键值写入 1 到文件 FILE1.TXT 和键<$ C $值C> 2 到 FILE2.TXT ：

I need to write values with key 1 to file file1.txt and values with key 2 to file2.txt:

val ar = Array (1 -> 1, 1 -> 2, 1 -> 3, 1 -> 4, 1 -> 5, 2 -> 6, 2 -> 7, 2 -> 8, 2 -> 9)
val distAr = sc.parallelize(ar)
val grk = distAr.groupByKey()

如何做到这一点不iterrating集合 GRK 两次

推荐答案

我们写不同客户不同的表，基本上是相同的用例的数据。我们使用的常见的模式是这样的：

We write data from different customers to different tables, which is essentially the same usecase. The common pattern we use is something like this:

val customers:List[String] = ???

customers.foreach{customer => rdd.filter(record => belongsToCustomer(record,customer)).saveToFoo()}

这可能不履行不遍历RDD两次（或n次）'的心愿，但过滤器是一种廉价的操作并行分布式环境做，它的作品，所以我认为它不符合欧盟的做事情的普通星火路。

This probably does not fulfill the wish of 'not iterating over the rdd twice (or n times)', but filter is a cheap operation to do in a parallel distributed environment and it works, so I think it does comply to the 'general Spark way' of doing things.

这篇关于星火：如何从RDD不同势组值写入到不同的文件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

星火：如何从RDD不同势组值写入到不同的文件？ [英] Spark: How to write diffrent group values from RDD to different files?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

星火：如何从RDD不同势组值写入到不同的文件？ [英] Spark: How to write diffrent group values from RDD to different files?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭