从“输入"读取Excel文件.Blob存储容器,然后在“输出"中导出到csv.带有python的容器 [英] Reading excel files from "input" blob storage container and exporting to csv in "output" container with python

查看:107
本文介绍了从“输入"读取Excel文件.Blob存储容器,然后在“输出"中导出到csv.带有python的容器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python开发脚本来从名为"source"的blob存储容器中读取 .xlsx 中的文件,并将其转换为 .csv 并将其存储在新容器中(我正在本地测试脚本,如果可以的话,应将其包含在ADF管道中).到目前为止,我设法访问了Blob存储,但是在读取文件内容时遇到了问题.

I'm trying to develop a script in python to read a file in .xlsx from a blob storage container called "source", convert it in .csv and store it in a new container (I'm testing the script locally, if working I should include it in an ADF pipeline). So far, I managed to access to the blob storage, but I'm having problems in reading the file content.

from azure.storage.blob import BlobServiceClient, ContainerClient, BlobClient
import pandas as pd

conn_str = "DefaultEndpointsProtocol=https;AccountName=XXXXXX;AccountKey=XXXXXX;EndpointSuffix=core.windows.net"
container = "source"
blob_name = "prova.xlsx"

container_client = ContainerClient.from_connection_string(
    conn_str=conn_str, 
    container_name=container
    )
# Download blob as StorageStreamDownloader object (stored in memory)
downloaded_blob = container_client.download_blob(blob_name)

df = pd.read_excel(downloaded_blob)

print(df)

我收到以下错误:

ValueError:无效的文件路径或缓冲区对象类型:< class'azure.storage.blob._download.StorageStreamDownloader'>

ValueError: Invalid file path or buffer object type: <class 'azure.storage.blob._download.StorageStreamDownloader'>

我尝试使用 .csv 文件作为输入,并按如下方式编写解析代码:

I tried with a .csv file as input and writing the parsing code as follows:

df = pd.read_csv(StringIO(downloaded_blob.content_as_text()) )

它有效.

关于如何修改代码以使excel文件可读的任何建议?

Any suggestion on how to modify the code so that the excel file becomes readable?

推荐答案

我总结如下解决方案.

当我们在sdk pandas 中使用方法 pd.read_excel()时,我们需要提供字节作为输入.但是,当我们使用 download_blob 从azure blob下载excel文件时,我们只获得了 azure.storage.blob.StorageStreamDownloader .因此,我们需要使用方法 readall() content_as_bytes()将其转换为字节.有关更多详细信息,请参阅文档

When we use the method pd.read_excel() in sdk pandas, we need to provide bytes as input. But when we use download_blob to download the excel file from azure blob, we just get azure.storage.blob.StorageStreamDownloader. So we need to use the method readall() or content_as_bytes() to convert it to bytes. For more details, please refer to the document and the document

这篇关于从“输入"读取Excel文件.Blob存储容器,然后在“输出"中导出到csv.带有python的容器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆