使用Azure DataBrick将数据帧写入Blob [英] Write dataframe to blob using azure databricks

查看:163
本文介绍了使用Azure DataBrick将数据帧写入Blob的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否存在任何链接或示例代码,我们可以在其中使用python(不使用pyspark模块)将数据帧写入到Azure Blob存储中.

Is there any link or sample code where we can write dataframe to azure blob storage using python (not using pyspark module).

推荐答案

下面是用于将CSV数据直接写入Azure Databricks Notebook中的Azure Blob存储容器的代码段.

Below is the code snippet for writing (dataframe) CSV data directly to an Azure blob storage container in an Azure Databricks Notebook.

# Configure blob storage account access key globally
spark.conf.set(
  "fs.azure.account.key.%s.blob.core.windows.net" % storage_name,
  sas_key)

output_container_path = "wasbs://%s@%s.blob.core.windows.net" % (output_container_name, storage_name)
output_blob_folder = "%s/wrangled_data_folder" % output_container_path

# write the dataframe as a single file to blob storage
(dataframe
 .coalesce(1)
 .write
 .mode("overwrite")
 .option("header", "true")
 .format("com.databricks.spark.csv")
 .save(output_blob_folder))

# Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')
files = dbutils.fs.ls(output_blob_folder)
output_file = [x for x in files if x.name.startswith("part-")]

# Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container
# While simultaneously changing the file name
dbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)

示例:笔记本

输出:使用Azure Databricks将数据帧写入blob存储

这篇关于使用Azure DataBrick将数据帧写入Blob的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆