将Amazon SageMaker端点集成到Glue或EMR上的批处理ETL工作流程中 [英] Integrating The Amazon SageMaker Endpoints, into Batch ETL workflows on Glue or EMR
问题描述
如何最好地通过基于Glue,基于EMR的Spark Jobs配置上述AWS Sagemaker ML模型端点?
How best can we have the said AWS Sagemaker ML model Endpoint configured via Glue, EMR based Spark Jobs ?
我们在AWS文档这里 ,创建了一个名为'linear-learner-2019-11-04-01-57-20-572'的端点,可以将其调用为
As we see in AWS Documentation 'here' , An End point names as 'linear-learner-2019-11-04-01-57-20-572' is created.It can be invoked as
response = client.invoke_endpoint(EndpointName='linear-learner-2019-11-04-01-57-20-572',
ContentType='text/csv',Body=values)
但是,假设我们有这样的批处理工作
- 在大数据上计划的批处理作业,从S3读取数据,
- 它经历了转换,添加了新列作为预测
- 结果存储为S3的输出。
- 可以每天触发一次,也可以在源文件夹中有新文件到达时触发
- scheduled batch job on a Big Data , Reads the data from a S3, where
- it undergo a transformation of adding a new column as prediction
- result Output stored as S3.
- Could be triggered on Daily basis, or On Arrival of a new file in source folder
我们如何最好地通过基于EMR的Glue的Spark Jobs配置上述端点?
推荐答案
您可以使用Amazon Step Functions创建操作的工作流程,并依次触发每个任务(EMR,Glue,Athena,SageMaker等)。关于批处理任务,我建议您考虑启动SageMaker Processing或SageMaker批处理推理作业
You can use Amazon Step Functions to create a workflow of actions and trigger each task one after the other (EMR, Glue, Athena, SageMaker, etc). Regarding batch tasks I recommend you consider launching a SageMaker Processing or SageMaker Batch Inference job
这篇关于将Amazon SageMaker端点集成到Glue或EMR上的批处理ETL工作流程中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!