如何通过S3 Events或AWS Lambda触发Glue ETL Pyspark作业? [英] How to Trigger Glue ETL Pyspark job through S3 Events or AWS Lambda?

查看：167 发布时间：2020/8/23 6:47:43 amazon-web-services amazon-s3 aws-lambda aws-glue

本文介绍了如何通过S3 Events或AWS Lambda触发Glue ETL Pyspark作业?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我计划使用Pyspark在AWS Glue ETL中写入某些作业，我想在将新文件放入AWS S3位置时触发该作业，就像我们使用S3事件触发AWS Lambda函数一样

I'm planning to write certain jobs in AWS Glue ETL using Pyspark, which I want to get triggered as and when a new file is dropped in an AWS S3 Location, just like we do for triggering AWS Lambda Functions using S3 Events.

但是，我只看到非常狭窄的选项来触发Glue ETL脚本.在这方面的任何帮助都将受到高度赞赏.

But, I see very narrowed down options only, to trigger a Glue ETL script. Any help on this shall be highly appreciated.

推荐答案

以下内容应该可以触发AWS Lambda的Glue作业.将lambda配置为适当的S3存储桶，并将IAM角色/权限分配给AWS Lambda，以便lambda可以代表用户启动AWS Glue作业.

The following should work to trigger a Glue job from AWS Lambda. Have the lambda configured to the appropriate S3 bucket, and IAM roles / permissions assigned to AWS Lambda so that lambda can start the AWS Glue job on behalf of the user.

import boto3
print('Loading function')

def lambda_handler(event, context):
    source_bucket = event['Records'][0]['s3']['bucket']['name']
    s3 = boto3.client('s3')
    glue = boto3.client('glue')
    gluejobname = "YOUR GLUE JOB NAME"

    try:
        runId = glue.start_job_run(JobName=gluejobname)
        status = glue.get_job_run(JobName=gluejobname, RunId=runId['JobRunId'])
        print("Job Status : ", status['JobRun']['JobRunState'])
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist '
              'and your bucket is in the same region as this '
              'function.'.format(source_bucket, source_bucket))
    raise e

这篇关于如何通过S3 Events或AWS Lambda触发Glue ETL Pyspark作业?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何通过S3 Events或AWS Lambda触发Glue ETL Pyspark作业? [英] How to Trigger Glue ETL Pyspark job through S3 Events or AWS Lambda?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何通过S3 Events或AWS Lambda触发Glue ETL Pyspark作业? [英] How to Trigger Glue ETL Pyspark job through S3 Events or AWS Lambda?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭