AWS Glue 输出文件名 [英] AWS Glue output file name

查看:61
本文介绍了AWS Glue 输出文件名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 AWS 来转换一些 JSON 文件.我已将文件从 S3 添加到 Glue.我设置的作业读取文件正常,作业运行成功,有一个文件添加到正确的 S3 存储桶中.我遇到的问题是我无法命名文件 - 它被赋予了一个随机名称,它也没有被赋予 .JSON 扩展名.

I am using AWS to transform some JSON files. I have added the files to Glue from S3. The job I have set up reads the files in ok, the job runs successfully, there is a file added to the correct S3 bucket. The issue I have is that I cant name the file - it is given a random name, it is also not given the .JSON extension.

如何命名文件并将扩展名添加到输出中?

How can I name the file and also add the extension to the output?

推荐答案

由于 Spark 工作方式的性质,无法命名文件.但是,之后可以立即重命名文件.

Due to the nature of how Spark works, it's not possible to name the file. However, it's possible to rename the file right afterward.

URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
fs = FileSystem.get(URI("s3://{bucket_name}"), sc._jsc.hadoopConfiguration())

file_path = "s3://{bucket_name}/processed/source={source_name}/year={partition_year}/week={partition_week}/"
df.coalesce(1).write.format("json").mode(
    "overwrite").option("codec", "gzip").save(file_path)

# rename created file
created_file_path = fs.globStatus(Path(file_path + "part*.gz"))[0].getPath()
fs.rename(
    created_file_path,
    Path(file_path + "{desired_name}.jl.gz"))

这篇关于AWS Glue 输出文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆