胶水作业无法写入文件 [英] Glue Job fails to write file

查看：73 发布时间：2020/8/23 8:13:43 amazon-web-services amazon-s3 pyspark aws-glue

本文介绍了胶水作业无法写入文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我通过胶水作业回填了一些数据.作业本身正在从s3中读取TSV，对数据进行少量转换，然后将其以Parquet形式写入S3.由于已经有了数据，因此我试图一次启动多个作业，以减少处理所有作业所需的时间.当我同时启动多个作业时，有时会遇到一个问题，其中一个文件将无法在S3中输出生成的Parquet文件.作业本身成功完成，没有引发错误.当我将作业作为非并行任务重新运行时，它会正确输出文件.是否存在胶水(或潜在火花)或S3引起我问题的问题?

I am back filling some data via glue jobs. The job itself is reading in a TSV from s3, transforming the data slightly, and writing it in Parquet to S3. Since I already have the data, I am trying to launch multiple jobs at once to reduce the amount of time needed to process it all. When I launch multiple jobs at the same time, I run into an issue sometimes where one of the files will fail to output the resultant Parquet files in S3. The job itself completes successfully without throwing an error When I rerun the job as a non-parallel task, the file it output correctly. Is there some issue, either with glue(or the underlying spark) or S3 that would cause my issue?

胶水作业无法写入文件 [英] Glue Job fails to write file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

胶水作业无法写入文件 [英] Glue Job fails to write file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭