直接在Azure Datalake中将Python Dataframe写入CSV文件 [英] Write Python Dataframe to CSV file directly in Azure Datalake

查看:98
本文介绍了直接在Azure Datalake中将Python Dataframe写入CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将一个excel文件导入到pandas数据框中,并已完成数据探索和清理过程.

I have imported an excel file into a pandas dataframe and have completed the data exploration and cleaning process.

我现在想将清理后的数据帧写到csv文件中,再回到Azure DataLake,而不必先将其保存为本地文件.我正在使用熊猫3.

I now want to write the cleaned dataframe to csv file back to Azure DataLake, without saving it first as a local file. I am using pandas 3.

我的代码如下:

token = lib.auth(tenant_id = '', 
                 client_secret ='', 
                 client_id = '')

adl = core.AzureDLFileSystem(token, store_name)

with adl.open(path='Raw/Gold/Myfile.csv', mode='wb') as f:
    **in_xls.to_csv(f, encoding='utf-8')**
    f.close()

我在粗体语句中得到以下转储.

I get the following dump in statement in bold.

TypeError:需要一个类似字节的对象,而不是'str'

TypeError: a bytes-like object is required, not 'str'

我也尝试过但没有运气

with adl.open(path='Raw/Gold/Myfile.csv', mode='wb') as f:
    with io.BytesIO(in_xls) as byte_buf:
        byte_buf.to_csv(f, encoding='utf-8')
        f.close()

我收到以下错误:

TypeError:需要一个类似字节的对象,而不是'DataFrame'

TypeError: a bytes-like object is required, not 'DataFrame'

任何想法/技巧都将不胜感激

Any ideas/tips will be much appreciated

推荐答案

前几天,我使用python 3.X与熊猫一起工作.此代码在本地计算机上运行,​​并连接到云中的Azure数据存储.

I got this working with pandas the other day with python 3.X. This code runs on an on premise machine and connects to the azure data store in the cloud.

假设df是熊猫数据框,则可以使用以下代码:

Assuming df is a pandas dataframe you can use the following code:

adl = core.AzureDLFileSystem(token, store_name='YOUR_ADLS_STORE_NAME')
      #toke is your login token that was created by whatever ADLS login method you decided.
      #Personally I use the ServiceProvider login
df_str = df.to_csv()
with adl.open('/path/to/file/on/adls/newfile.csv', 'wb') as f:
    f.write(str.encode(df_str))
    f.close()

此键将数据帧转换为字符串,而不是使用str.encode()函数.

This key is converting the dataframe to a string and than using the str.encode() function.

希望这会有所帮助.

这篇关于直接在Azure Datalake中将Python Dataframe写入CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆