如何重命名Azure中保存在数据湖上的文件 [英] How do I rename the file that was saved on a datalake in Azure

查看:108
本文介绍了如何重命名Azure中保存在数据湖上的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用数据砖中的scala在Datalake中合并两个文件,并使用以下代码将其保存回Datalake:

I tried to merge two files in a Datalake using scala in data bricks and saved it back to the Datalake using the following code:

val df =sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("adl://xxxxxxxx/Test/CSV") 
df.coalesce(1).write.
              format("com.databricks.spark.csv").
              mode("overwrite").
              option("header", "true").
save("adl://xxxxxxxx/Test/CSV/final_data.csv")

但是,文件final_data.csv保存为目录而不是包含多个文件的文件,而实际的.csv文件保存为'part-00000-tid-dddddddddd-xxxxxxxxxx.csv'.

However the file final_data.csv is saved as a directory instead of a file with multiple files and the actual .csv file is saved as 'part-00000-tid-dddddddddd-xxxxxxxxxx.csv'.

如何重命名此文件,以便可以将其移动到另一个目录?

How do I rename this file so that I can move it to another directory?

推荐答案

知道了.可以使用以下代码将其重命名并放置到另一个目标中.合并的当前文件也将被删除.

Got it. It can be renamed and placed into another destination using the following code. Also current files that were merged will be deleted.

val x = "Source"
val y = "Destination"
val df = sqlContext.read.format("csv")
        .option("header", "true").option("inferSchema", "true")
        .load(x+"/")
df.repartition(1).write.
   format("csv").
   mode("overwrite").
   option("header", "true").
   save(y+"/"+"final_data.csv")
dbutils.fs.ls(x).filter(file=>file.name.endsWith("csv")).foreach(f => dbutils.fs.rm(f.path,true))
dbutils.fs.mv(dbutils.fs.ls(y+"/"+"final_data.csv").filter(file=>file.name.startsWith("part-00000"))(0).path,y+"/"+"data.csv")
dbutils.fs.rm(y+"/"+"final_data.csv",true)

这篇关于如何重命名Azure中保存在数据湖上的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆