如何停止/退出 AWS Glue 作业 (PySpark)? [英] How to stop / exit a AWS Glue Job (PySpark)?

查看:85
本文介绍了如何停止/退出 AWS Glue 作业 (PySpark)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个成功运行的 AWS Glue 作业,可以转换数据以进行预测.如果达到特定条件,我想停止处理并输出状态消息(正在运行):

I have a successfully running AWS Glue Job that transform data for predictions. I would like to stop processing and output status message (which is working) if I reach a specific condition:

if specific_condition is None:
    s3.put_object(Body=json_str, Bucket=output_bucket, Key=json_path )
    return None

这会产生SyntaxError: 'return' external function",我试过:

This produces "SyntaxError: 'return' outside function", I tried:

if specific_condition is None:
    s3.put_object(Body=json_str, Bucket=output_bucket, Key=json_path )
    job.commit()

这不是在 AWS Lambda 中运行,而是使用 Lambda 启动的 Glue 作业(例如,start_job_run()).

This is not running in AWS Lambda, it is Glue Job that gets started using Lambda (e.g., start_job_run()).

推荐答案

Glue Spark 作业中没有返回值,job.commit() 只是向 Glue 发出信号表示作业的任务已完成,仅此而已,脚本在此之后继续运行.要在流程完成后结束您的工作,您必须:

There's no return in Glue Spark jobs, and job.commit() just signals Glue that the job's task was completed and that's all, script continues its run after that. To end your job after your process is complete, you'll have to:

  1. Call sys.exit(STATUS_CODE) #状态码可以任意
  2. 根据条件进行战略性编码,以便作业在 job.commit 之后没有任何代码行.

请注意,如果在 job.commit() 之前调用 sys.exit,则粘合作业将失败.

Please note that, if sys.exit is called before job.commit(), glue job will be failed.

这篇关于如何停止/退出 AWS Glue 作业 (PySpark)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆