AWS Glue作业输入参数 [英] AWS Glue Job Input Parameters

查看:440
本文介绍了AWS Glue作业输入参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对AWS相对较新,这可能不是一个技术问题,但是目前AWS Glue指出最多允许创建25个作业.我们正在加载一系列表,每个表都有自己的工作,这些工作随后会附加审计列.每个作业都非常相似,但是只是更改连接字符串的源和目标.

I am relatively new to AWS and this may be a bit less technical question, but at present AWS Glue notes a maximum of 25 jobs permitted to be created. We are loading in a series of tables that each have their own job that subsequently appends audit columns. Each job is very similar, but simply changes the connection string source and target.

是否有一种方法可以对这些作业进行参数化以允许重复使用,并简单地将适当的连接字符串传递给它们?甚至可能遍历主作业中的一组设置的连接字符串,从而调用子作业来传递变化的连接字符串?

Is there a way to parameterize these jobs to allow for reuse and simply pass the proper connection strings to them? Or even possibly loop through a set connection strings in a master job that would call a child job passing the varying connection strings through?

任何示例或文档将不胜感激

Any examples or documentation would be most appreciated

推荐答案

在下面的示例中,我介绍了如何在代码中使用Glue作业输入参数.这段代码接受输入参数,并将它们写入平面文件.

In the below example I present how to use Glue job input parameters in the code. This code takes the input parameters and it writes them to the flat file.

1)在作业配置中设置输入参数.

1) Setting the input parameters in the job configuration.

2)胶水作业代码

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
args = getResolvedOptions(sys.argv, ['JOB_NAME','VAL1','VAL2','VAL3','DEST_FOLDER'])
job.init(args['JOB_NAME'], args)

v_list=[{"VAL1":args['VAL1'],"VAL2":args['VAL2'],"VAL3":args['VAL3']}]

df=sc.parallelize(v_list).toDF()
df.repartition(1).write.mode('overwrite').format('csv').options(header=True, delimiter = ';').save("s3://"+ args['DEST_FOLDER'] +"/")

job.commit()

3)在使用boto3,CloudFormation或StepFunction的过程中还可以提供输入参数.此示例显示了如何使用boto3做到这一点.

3) There is also possible to provide input parameters during using boto3, CloudFormation or StepFunctions. This example shows how to do it by using boto3.

import boto3

def lambda_handler(event, context):
    glue = boto3.client('glue')


    myJob = glue.create_job(Name='example_job2', Role='AWSGlueServiceDefaultRole',
                            Command={'Name': 'glueetl','ScriptLocation': 's3://aws-glue-scripts/example_job'},
                            DefaultArguments={"VAL1":"value1","VAL2":"value2","VAL3":"value3"}       
                                   )

    glue.start_job_run(JobName=myJob['Name'], Arguments={"VAL1":"value11","VAL2":"value22","VAL3":"value33"})

有用的链接:

  1. https://docs. aws.amazon.com/glue/latest/dg/aws-glue-programming-python-calling.html
  2. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job
  3. https://docs.aws.amazon. com/step-functions/latest/dg/connectors-glue.html
  1. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html
  2. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-calling.html
  3. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client.create_job
  4. https://docs.aws.amazon.com/step-functions/latest/dg/connectors-glue.html

这篇关于AWS Glue作业输入参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆