如何将 Pyspark 数据帧转换为 CSV 而不将其发送到文件? [英] How can I convert a Pyspark dataframe to a CSV without sending it to a file?

查看：45 发布时间：2021/6/25 18:33:31 apache-spark pyspark

本文介绍了如何将 Pyspark 数据帧转换为 CSV 而不将其发送到文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框，需要将其转换为 CSV 文件，然后我需要将此 CSV 发送到 API.当我将它发送到 API 时，我不想将它保存到本地文件系统并需要将它保存在内存中.我该怎么做?

I have a dataframe which I need to convert to a CSV file, and then I need to send this CSV to an API. As I'm sending it to an API, I do not want to save it to the local filesystem and need to keep it in memory. How can I do this?

推荐答案

简单方法: 使用 toPandas() 将您的数据帧转换为 Pandas 数据帧，然后保存为字符串.要保存为字符串而不是文件，您必须使用 path_or_buf=None 调用 to_csv.然后在 API 调用中发送字符串.

Easy way: convert your dataframe to Pandas dataframe with toPandas(), then save to a string. To save to a string, not a file, you'll have to call to_csv with path_or_buf=None. Then send the string in an API call.

来自 to_csv() 文档:

参数

path_or_bufstr 或文件句柄，默认无

path_or_bufstr or file handle, default None

文件路径或对象，如果没有提供，则结果作为字符串返回.

File path or object, if None is provided the result is returned as a string.

因此您的代码可能如下所示:

So your code would likely look like this:

csv_string = df.toPandas().to_csv(path_or_bufstr=None)

替代方案: 使用 tempfile.SpooledTemporaryFile 使用大缓冲区创建内存文件.或者您甚至可以使用常规文件，只需将缓冲区变大足够了，不要刷新或关闭文件.查看Corey Goldberg 的解释，了解为什么会这样.

Alternatives: use tempfile.SpooledTemporaryFile with a large buffer to create an in-memory file. Or you can even use a regular file, just make your buffer large enough and don't flush or close the file. Take a look at Corey Goldberg's explanation of why this works.

这篇关于如何将 Pyspark 数据帧转换为 CSV 而不将其发送到文件?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将 Pyspark 数据帧转换为 CSV 而不将其发送到文件? [英] How can I convert a Pyspark dataframe to a CSV without sending it to a file?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将 Pyspark 数据帧转换为 CSV 而不将其发送到文件? [英] How can I convert a Pyspark dataframe to a CSV without sending it to a file?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭