AWS Glue 输出文件名 [英] AWS Glue output file name
问题描述
我正在使用 AWS 来转换一些 JSON 文件.我已将文件从 S3 添加到 Glue.我设置的作业读取文件正常,作业运行成功,有一个文件添加到正确的 S3 存储桶中.我遇到的问题是我无法命名文件 - 它被赋予了一个随机名称,它也没有被赋予 .JSON 扩展名.
I am using AWS to transform some JSON files. I have added the files to Glue from S3. The job I have set up reads the files in ok, the job runs successfully, there is a file added to the correct S3 bucket. The issue I have is that I cant name the file - it is given a random name, it is also not given the .JSON extension.
如何命名文件并将扩展名添加到输出中?
How can I name the file and also add the extension to the output?
推荐答案
由于 Spark 工作方式的性质,无法命名文件.但是,之后可以立即重命名文件.
Due to the nature of how Spark works, it's not possible to name the file. However, it's possible to rename the file right afterward.
URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
fs = FileSystem.get(URI("s3://{bucket_name}"), sc._jsc.hadoopConfiguration())
file_path = "s3://{bucket_name}/processed/source={source_name}/year={partition_year}/week={partition_week}/"
df.coalesce(1).write.format("json").mode(
"overwrite").option("codec", "gzip").save(file_path)
# rename created file
created_file_path = fs.globStatus(Path(file_path + "part*.gz"))[0].getPath()
fs.rename(
created_file_path,
Path(file_path + "{desired_name}.jl.gz"))
这篇关于AWS Glue 输出文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!