从AWS胶pythonshell作业中的有效负载检索s3路径 [英] retrieving s3 path from payload inside AWS glue pythonshell job

查看:98
本文介绍了从AWS胶pythonshell作业中的有效负载检索s3路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在AWS胶内有一个pythonshell作业,需要从s3路径下载文件.该s3路径位置是一个变量,因此将作为start_run_job调用中的有效负载进入胶粘作业,如下所示:

I have a pythonshell job inside AWS glue that needs to download a file from a s3 path. This s3 path location is a variable so will come to the glue job as a payload in start_run_job call like below:

import boto3    
payload = {'s3_target_file':s3_TARGET_FILE_PATH,
            's3_test_file': s3_TEST_FILE_PATH}
    job_def = dict(
                JobName=MY_GLUE_PYTHONSHELL_JOB,
                Arguments=payload,
                WorkerType='Standard',
                NumberOfWorkers=2,
            )

response = glue.start_job_run(**job_def)

我的问题是,如何从通过boto3来的AWS Glue pythonshell作业中的有效负载中检索那些s3路径?我们需要编写类似于AWS Lambda的任何处理程序吗?

My question is, how do I retrieve those s3 paths from the payload inside AWS Glue pythonshell job that comes through boto3? Is there any sort of handler we need to write similar to AWS Lambda?

请提出建议.

推荐答案

检查

Check the docimentation. All you need is here.

您可以按以下方式使用getResolvedOptions:

You can use the getResolvedOptions as follows:

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv,
                          ['JOB_NAME',
                           'day_partition_key',
                           'hour_partition_key',
                           'day_partition_value',
                           'hour_partition_value'])
print "The day partition key is: ", args['day_partition_key']
print "and the day partition value is: ", args['day_partition_value']

这篇关于从AWS胶pythonshell作业中的有效负载检索s3路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆