从AWS胶pythonshell作业中的有效负载检索s3路径 [英] retrieving s3 path from payload inside AWS glue pythonshell job
问题描述
我在AWS胶内有一个pythonshell作业,需要从s3路径下载文件.该s3路径位置是一个变量,因此将作为start_run_job
调用中的有效负载进入胶粘作业,如下所示:
I have a pythonshell job inside AWS glue that needs to download a file from a s3 path. This s3 path location is a variable so will come to the glue job as a payload in start_run_job
call like below:
import boto3
payload = {'s3_target_file':s3_TARGET_FILE_PATH,
's3_test_file': s3_TEST_FILE_PATH}
job_def = dict(
JobName=MY_GLUE_PYTHONSHELL_JOB,
Arguments=payload,
WorkerType='Standard',
NumberOfWorkers=2,
)
response = glue.start_job_run(**job_def)
我的问题是,如何从通过boto3来的AWS Glue pythonshell作业中的有效负载中检索那些s3路径?我们需要编写类似于AWS Lambda的任何处理程序吗?
My question is, how do I retrieve those s3 paths from the payload inside AWS Glue pythonshell job that comes through boto3? Is there any sort of handler we need to write similar to AWS Lambda?
请提出建议.
推荐答案
Check the docimentation. All you need is here.
您可以按以下方式使用getResolvedOptions
:
You can use the getResolvedOptions
as follows:
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv,
['JOB_NAME',
'day_partition_key',
'hour_partition_key',
'day_partition_value',
'hour_partition_value'])
print "The day partition key is: ", args['day_partition_key']
print "and the day partition value is: ", args['day_partition_value']
这篇关于从AWS胶pythonshell作业中的有效负载检索s3路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!