AWS Glue输出文件名 [英] AWS Glue output file name

查看:216
本文介绍了AWS Glue输出文件名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用AWS转换一些JSON文件.我已将文件从S3添加到Glue.我设置的作业以ok的方式读取文件,该作业成功运行,并且已将文件添加到正确的S3存储桶中.我遇到的问题是我无法命名文件-给它一个随机名称,也没有给它.JSON扩展名.

I am using AWS to transform some JSON files. I have added the files to Glue from S3. The job I have set up reads the files in ok, the job runs successfully, there is a file added to the correct S3 bucket. The issue I have is that I cant name the file - it is given a random name, it is also not given the .JSON extension.

如何命名文件并将扩展名添加到输出中?

How can I name the file and also add the extension to the output?

推荐答案

由于Spark的工作原理,因此无法命名文件.但是,之后可以重命名该文件.

Due to the nature of how Spark works, it's not possible to name the file. However, it's possible to rename the file right afterward.

URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
fs = FileSystem.get(URI("s3://{bucket_name}"), sc._jsc.hadoopConfiguration())

file_path = "s3://{bucket_name}/processed/source={source_name}/year={partition_year}/week={partition_week}/"
df.coalesce(1).write.format("json").mode("overwrite").option("codec", "gzip").save(file_path)

# rename created file
created_file_path = fs.globStatus(Path(file_path + "part*.gz"))[0].getPath()
fs.rename(
    created_file_path,
    Path(file_path + "{desired_name}.jl.gz"))

这篇关于AWS Glue输出文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆