将Amazon SageMaker端点集成到Glue或EMR上的批处理ETL工作流程中 [英] Integrating The Amazon SageMaker Endpoints, into Batch ETL workflows on Glue or EMR

查看:139
本文介绍了将Amazon SageMaker端点集成到Glue或EMR上的批处理ETL工作流程中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何最好地通过基于Glue,基于EMR的Spark Jobs配置上述AWS Sagemaker ML模型端点?

How best can we have the said AWS Sagemaker ML model Endpoint configured via Glue, EMR based Spark Jobs ?

我们在AWS文档这里 ,创建了一个名为'linear-learner-2019-11-04-01-57-20-572'的端点,可以将其调用为

As we see in AWS Documentation 'here' , An End point names as 'linear-learner-2019-11-04-01-57-20-572' is created.It can be invoked as

  response = client.invoke_endpoint(EndpointName='linear-learner-2019-11-04-01-57-20-572',
ContentType='text/csv',Body=values)

但是,假设我们有这样的批处理工作


  • 在大数据上计划的批处理作业,从S3读取数据,

  • 它经历了转换,添加了新列作为预测

  • 结果存储为S3的输出。

  • 可以每天触发一次,也可以在源文件夹中有新文件到达时触发

  • scheduled batch job on a Big Data , Reads the data from a S3, where
  • it undergo a transformation of adding a new column as prediction
  • result Output stored as S3.
  • Could be triggered on Daily basis, or On Arrival of a new file in source folder

我们如何最好地通过基于EMR的Glue的Spark Jobs配置上述端点?

推荐答案

您可以使用Amazon Step Functions创建操作的工作流程,并依次触发每个任务(EMR,Glue,Athena,SageMaker等)。关于批处理任务,我建议您考虑启动SageMaker Processing或SageMaker批处理推理作业

You can use Amazon Step Functions to create a workflow of actions and trigger each task one after the other (EMR, Glue, Athena, SageMaker, etc). Regarding batch tasks I recommend you consider launching a SageMaker Processing or SageMaker Batch Inference job

这篇关于将Amazon SageMaker端点集成到Glue或EMR上的批处理ETL工作流程中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆