PySpark:将一对RDD转换回常规RDD [英] PySpark: Convert a pair RDD back to a regular RDD

查看：75 发布时间：2021/2/15 18:46:19 pyspark rdd keyvaluepair

本文介绍了PySpark:将一对RDD转换回常规RDD的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有什么方法可以将一对RDD转换回常规RDD吗?

Is there any way I can convert a pair RDD back to a regular RDD?

假设我获得了本地csv文件，然后首先将其作为常规rdd加载

Suppose I get a local csv file, and I first load it as a regular rdd

rdd = sc.textFile("$path/$csv")

然后我创建一个rdd对(即key是，"之前的字符串，value是，"之后的字符串)

Then I create a pair rdd (i.e. key is the string before "," and value is the string after ",")

pairRDD = rdd.map(lambda x : (x.split(",")[0], x.split(",")[1]))

我通过使用saveAsTextFile()存储pairRDD

I store the pairRDD by using the saveAsTextFile()

pairRDD.saveAsTextFile("$savePath")

但是，根据调查，存储的文件将包含一些必需的字符，例如"u'"，((和")"(因为pyspark只是调用toString()来存储键值对) 我想知道是否可以转换回常规rdd，以便保存的文件不会包含"u'"或(("和)")? 还是我可以用来消除不必要字符的任何其他存储方法?

However, as investigated, the stored file will contain some necessary characters, such as "u'", "(" and ")" (as pyspark simply calls toString(), to store key-value pairs) I was wondering if I can convert back to a regular rdd, so that the saved file wont contain "u'" or "(" and ")"? Or any other storage methods I can use to get rid of the unnecessary characters ?

推荐答案

这些字符是数据的Python表示形式(字符串(元组和Unicode字符串)).由于您使用saveAsTextFile，因此应将数据转换为文本(即每条记录一个字符串).您可以使用map将键/值元组再次转换为单个值，例如:

Those characters are the Python representation of your data as string (tuples and Unicode strings). You should convert your data to text (i.e. a single string per record) since you use saveAsTextFile. You can use map to convert the key/value tuple into a single value again, e.g.:

pairRDD.map(lambda (k,v): "Value %s for key %s" % (v,k)).saveAsTextFile(savePath)

这篇关于PySpark:将一对RDD转换回常规RDD的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PySpark:将一对RDD转换回常规RDD [英] PySpark: Convert a pair RDD back to a regular RDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PySpark:将一对RDD转换回常规RDD [英] PySpark: Convert a pair RDD back to a regular RDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭