如何在Pyspark中将Spark数据框保存为文本文件而没有行? [英] How to save a spark dataframe as a text file without Rows in pyspark?

查看：579 发布时间：2020/9/4 5:10:11 python apache-spark pyspark

本文介绍了如何在Pyspark中将Spark数据框保存为文本文件而没有行?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框"df"，其列为['name'，'age'] 我使用df.rdd.saveAsTextFile("..")保存了数据框，将其另存为rdd.我加载了保存的文件，然后collect()给出了以下结果.

I have a dataframe "df" with the columns ['name', 'age'] I saved the dataframe using df.rdd.saveAsTextFile("..") to save it as an rdd. I loaded the saved file and then collect() gives me the following result.

a = sc.textFile("\mee\sample")
a.collect()
Output:
    [u"Row(name=u'Alice', age=1)",
     u"Row(name=u'Alice', age=2)",
     u"Row(name=u'Joe', age=3)"]

这不是行数.

a.map(lambda g:g.age).collect()
AttributeError: 'unicode' object has no attribute 'age'

有什么方法可以将数据框另存为没有列名和行关键字的普通rdd? 我想保存数据框，以便在加载文件并收集时应如下所示:

Is there any way to save the dataframe as a normal rdd without column names and Row keywords? I want to save the dataframe so that on loading the file and collect should give me as follows:

a.collect()   
[(Alice,1),(Alice,2),(Joe,3)]

推荐答案

这是正常的RDD[Row].问题是当您saveAsTextFile并加载textFile时，您得到的是一堆字符串.如果要保存对象，则应使用某种形式的序列化.例如pickleFile:

It is a normal RDD[Row]. Problem is you that when you saveAsTextFile and load with textFile what you get is a bunch of strings. If you want to save objects you should use some form of serialization. For example pickleFile:

from pyspark.sql import Row

df = sqlContext.createDataFrame(
   [('Alice', 1), ('Alice', 2), ('Joe', 3)],
   ("name", "age")
)

df.rdd.map(tuple).saveAsPickleFile("foo")
sc.pickleFile("foo").collect()

## [('Joe', 3), ('Alice', 1), ('Alice', 2)]

这篇关于如何在Pyspark中将Spark数据框保存为文本文件而没有行?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Pyspark中将Spark数据框保存为文本文件而没有行? [英] How to save a spark dataframe as a text file without Rows in pyspark?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Pyspark中将Spark数据框保存为文本文件而没有行? [英] How to save a spark dataframe as a text file without Rows in pyspark?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭