从AWS Redshift到S3的AWS Glue ETL作业失败 [英] AWS Glue ETL job from AWS Redshift to S3 fails

查看:216
本文介绍了从AWS Redshift到S3的AWS Glue ETL作业失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在试用AWS Glue服务以将一些数据从redshift迁移到S3.爬网程序成功运行并在数据目录中创建了元表,但是,当我运行ETL作业(由AWS生成)时,它在大约20分钟后失败,并说"Resource unavailable".

I am trying out AWS Glue service to ETL some data from redshift to S3. Crawler runs successfully and creates the meta table in data catalog, however when I run the ETL job ( generated by AWS ) it fails after around 20 minutes saying "Resource unavailable".

我看不到在Cloudwatch中创建的AWS粘合日志或错误日志.当我尝试查看它们时,它说:找不到日志流.找不到日志流jr_xxxxxxxxxx.检查它是否正确创建,然后重试."

I cannot see AWS glue logs or error logs created in Cloudwatch. When I try to view them it says "Log stream not found. The log stream jr_xxxxxxxxxx could not be found. Check if it was correctly created and retry."

如果您能提供解决此问题的任何指导,我们将不胜感激.

I would appreciate it if you could provide any guidance to resolve this issue.

推荐答案

因此,基本上,如果在您的Glue所在区域没有太多的流量,添加到Glue的作业将运行.如果没有可用资源,则需要再次手动重新添加作业,也可以

So basically, the job you add to Glue will either run if there's not too much traffic in the region your Glue is. If there are no resources available, you need to either manually re-add the job again or you can also bind yourself to events from CloudWatch via SNS.

此外,还有一些参数可以传递给作业,例如maximunRetrytimeout.

Also, there are parameters you can pass to the job like maximunRetry and timeout.

如果您有Ressource not available,它将不会触发重试,因为作业没有失败,甚至都没有开始.但是,如果将timeout设置为60 minutes,它将在该时间之后触发错误,减少重试池并重新启动作业.

If you have a Ressource not available, it won't trigger a retry because the job did not fail, it just didn't even started. But if you set the timeout to let's say 60 minutes, it will trigger an error after that time, decrement your retry pool and re-launch the job.

这篇关于从AWS Redshift到S3的AWS Glue ETL作业失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆