AWS Glue-从作业内部访问工作流参数 [英] AWS Glue -- Access Workflow Parameters from Within Job

查看:201
本文介绍了AWS Glue-从作业内部访问工作流参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从粘合作业中检索粘合工作流程参数?

How can I retrieve Glue Workflow parameters from within a glue job?

我有一个类型为"python shell"的AWS Glue作业,该作业是从粘合工作流程中定期触发的.

I have an AWS Glue job of type "python shell" that is triggered periodically from within a glue workflow.

该作业的代码将在大量不同的工作流程中重用,因此我希望检索工作流程参数以消除对冗余作业的需求.

The job's code is to be reused from within a large number of different workflows so I'm looking to retrieve workflow parameters to eliminate the need for redundant jobs.

AWS开发人员指南提供了以下教程: https://docs.aws.amazon. com/glue/latest/dg/workflow-run-properties-code.html

The AWS Developers guide provides the following tutorial: https://docs.aws.amazon.com/glue/latest/dg/workflow-run-properties-code.html

但是我一直未能成功地执行示例代码而不触发错误.我怀疑该示例可能仅适用于scala/pyspark作业,而不适用于python shell作业.

But I've been unsuccessful in getting the sample code to execute without triggering errors. I suspect that this example may only apply to the scala/ pyspark jobs and not to python shell jobs.

我已经在相关工作中尝试了以下代码

I've tried the following code from within the relevant job

import sys
import boto3
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv, ['JOB_NAME','WORKFLOW_NAME', 'WORKFLOW_RUN_ID'])
workflow_name = args['WORKFLOW_NAME']
workflow_run_id = args['WORKFLOW_RUN_ID']
workflow_params = glue_client.get_workflow_run_properties(Name=workflow_name,
                                    RunId=workflow_run_id)["RunProperties"]

print(workflow_name, workflow_run_id, workflow_params)

当我按需触发工作流时,会收到以下错误消息:

When I trigger the workflow on demand I receive the following error messages:

> Traceback (most recent call last):
> File "/tmp/runscript.py", line 115, in <module>
> runpy.run_path(temp_file_path, run_name='__main__')
> File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
> pkg_name=pkg_name, script_name=fname)
> File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
> mod_name, mod_spec, pkg_name, script_name)
> File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
> exec(code, run_globals)
> File "/tmp/glue-python-scripts-w4fbwl3n/map_etl_python_shell_test_env.py", line 10, in <module>
> File "/glue/lib/awsglue/utils.py", line 10, in getResolvedOptions
> parsed, extra = parser.parse_known_args(args)
> File "/usr/local/lib/python3.6/argparse.py", line 1766, in parse_known_args
> namespace, args = self._parse_known_args(args, namespace)
> File "/usr/local/lib/python3.6/argparse.py", line 2001, in _parse_known_args
', '.join(required_actions))
> File "/usr/local/lib/python3.6/argparse.py", line 2393, in error
> self.exit(2, _('%(prog)s: error: %(message)s\n') % args)
> File "/usr/local/lib/python3.6/argparse.py", line 2380, in exit
> _sys.exit(status)
> SystemExit: 2
> 
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File "/tmp/runscript.py", line 134, in <module>
> raise e_type(e_value).with_tracsback(new_stack)
> AttributeError: 'SystemExit' object has no attribute 'with_tracsback'

推荐答案

boto3库为您提供了有趣的功能

The boto3 library provides you an interesting function

glue = boto3.client(service_name='glue', region_name="my-region")
job = glue.get_job(JobName="my-job-name")

default_parameters = job['Job']['DefaultArguments']
default_parameters[u'--my-parameter']

通过这种方式,您应该能够通过default_parameter操纵Glue Job参数.我不确定它是否可以直接在Glue Job中使用,但是外部脚本应该能够处理Glue Job参数.

In this way you should be able to manipulate Glue Job arguments through default_parameter. I'm not sure it works straightaway in the Glue Job, but an external script should be able to handle the Glue Job arguments.

这篇关于AWS Glue-从作业内部访问工作流参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆