databricks:直接将spark数据框写入excel [英] databricks: writing spark dataframe directly to excel

查看:260
本文介绍了databricks:直接将spark数据框写入excel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有任何方法可以直接将spark数据帧写入xls/xlsx格式????

Are there any method to write spark dataframe directly to xls/xlsx format ????

网络中的大多数示例都显示了有关熊猫数据框的示例.

Most of the example in the web showing there is example for panda dataframes.

但是我想使用spark数据框来处理我的数据.有什么主意吗?

but I would like to use spark dataframe for working with my data. Any idea ?

推荐答案

我假设由于您具有"databricks"标签,因此您想在databricks文件存储中创建一个.xlsx文件,并且您正在其中运行代码databrick笔记本.我还要假设您的笔记本正在运行python.

I'm assuming that because you have the "databricks" tag you are wanting to create an .xlsx file within databricks file store and that you are running code within databricks notebooks. I'm also going to assume that your notebooks are running python.

没有直接方法可以从Spark数据框中保存excel文档.但是,您可以将spark数据框转换为pandas数据框,然后从那里导出.我们需要先安装 xlsxwriter 软件包.您可以使用 databricks实用工具命令在笔记本环境中执行此操作:

There is no direct way to save an excel document from a spark dataframe. You can, however, convert a spark dataframe to a pandas dataframe then export from there. We'll need to start by installing the xlsxwriter package. You can do this for your notebook environment using a databricks utilites command:

dbutils.library.installPyPI('xlsxwriter')
dbutils.library.restartPython()

我遇到一些权限问题,直接将excel文件保存到dbfs.一种快速的解决方法是将其保存到群集的默认目录,然后将其sudo移动到dbfs中.这是一些示例代码:

I was having a few permission issues saving an excel file directly to dbfs. A quick workaround was to save to the cluster's default directory then sudo move the file into dbfs. Here's some example code:

# Creating dummy spark dataframe
spark_df = spark.sql('SELECT * FROM default.test_delta LIMIT 100')

# Converting spark dataframe to pandas dataframe
pandas_df = spark_df.toPandas()

# Exporting pandas dataframe to xlsx file
pandas_df.to_excel('excel_test.xlsx', engine='xlsxwriter')

然后在新命令中,使用%sh指定要在shell中运行的命令:

Then in a new command, specifying the command to run in shell with %sh:

%sh
sudo mv excel_test.xlsx /dbfs/mnt/data/

这篇关于databricks:直接将spark数据框写入excel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆