Pandas:写入 Excel 在 Databricks 中不起作用 [英] Pandas: Write to Excel not working in Databricks

查看:41
本文介绍了Pandas:写入 Excel 在 Databricks 中不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将镶木地板文件转换为 Excel 文件.但是,当我尝试这样做时,使用 Pandas 或 openpyxl 引擎,它显示Operation not supported";错误.但是,我可以使用 databricks 中的 openpyxl 引擎读取 excel 文件.

I was trying to convert parquet file to excel file. However, when I am trying to do so, using pandas or openpyxl engine, it is showing "Operation not supported" error. However, I can read excel file using openpyxl engine in databricks.

在阅读以下代码时:

xlfile = '/dbfs/mnt/raw/BOMFILE.xlsx'
tmp_csv = '/dbfs/mnt/trusted/BOMFILE.csv'
pdf = pd.DataFrame(pd.read_excel(xlfile, engine='openpyxl'))
pdf.to_csv (tmp_csv, index = None, header=True)

但是,当我尝试使用 openpyxl 和 xlswriter 编写相同的代码时,它不起作用:

However, when I tried to write the same using openpyxl as well as xlswriter, it is not working:

parq = '/mnt/raw/PRODUCT.parquet'
final = '/dbfs/mnt/trusted/PRODUCT.xlsx'
df = spark.read.format("parquet").option("header", "true").load(parq)
pandas_df = df.toPandas()
pandas_df.to_excel(final, engine='openpyxl')
#pandas_df.to_excel(outfile, engine='xlsxwriter')#, sheet_name=tbl)

我遇到的错误:

FileCreateError: [Errno 95] Operation not supported

OSError: [Errno 95] Operation not supported
During handling of the above exception, another exception occurred:
FileCreateError                           Traceback (most recent call last)
<command-473603709964454> in <module>
     17       final = '/dbfs/mnt/trusted/PRODUCT.xlsx'
     18       print(outfile)
---> 19       pandas_df.to_excel(outfile, engine='openpyxl')
     20       #pandas_df.to_excel(outfile, engine='xlsxwriter')#, sheet_name=tbl)

/databricks/python/lib/python3.7/site-packages/pandas/core/generic.py in to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep, verbose, freeze_panes)
   2179             startcol=startcol,
   2180             freeze_panes=freeze_panes,
-> 2181             engine=engine,
   2182         )
   2183 

请提出建议.

推荐答案

问题是有限制 涉及 DBFS 中的本地文件 API 支持(/dbfs 保险丝).例如,它不支持 Excel 文件所需的随机写入.来自文档:

The problem is that there are limitations when it comes to the local file API support in DBFS (the /dbfs fuse). For example, it doesn't support random writes that are required for Excel files. From documentation:

不支持随机写入.对于需要随机写入的工作负载,首先在本地磁盘上执行 I/O,然后将结果复制到/dbfs.

Does not support random writes. For workloads that require random writes, perform the I/O on local disk first and then copy the result to /dbfs.

在你的情况下可能是:

from shutil import copyfile

parq = '/mnt/raw/PRODUCT.parquet'
final = '/dbfs/mnt/trusted/PRODUCT.xlsx'
temp_file = '/tmp/PRODUCT.xlsx'
df = spark.read.format("parquet").option("header", "true").load(parq)
pandas_df = df.toPandas()
pandas_df.to_excel(temp_file, engine='openpyxl')

copyfile(temp_file, final)

附言您还可以使用 dbutils.fs.cp 复制文件(doc) - 它也适用于不支持 /dbfs 的社区版

P.S. You can also use dbutils.fs.cp to copy file (doc) - it will also work on Community Edition where the /dbfs isn't supported

这篇关于Pandas:写入 Excel 在 Databricks 中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆