如何将 stepfunction executionId 解析为 SageMaker 批量转换作业名称? [英] How to parse stepfunction executionId to SageMaker batch transform job name?

查看:18
本文介绍了如何将 stepfunction executionId 解析为 SageMaker 批量转换作业名称?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个stepfunction,下面这个状态机的定义(step-function.json)在terraform中使用(使用这个页面中的语法:https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html)

I have created a stepfunction, the definition for this statemachine below (step-function.json) is used in terraform (using the syntax in this page:https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html)

如果我第一次执行这个状态机,它会创建一个名为 example-jobname 的 SageMaker 批量转换作业,但是我需要每天执行这个状态机,然后它会给我错误 错误":SageMaker.ResourceInUseException",原因":作业名称在 AWS 账户和区域内必须是唯一的,并且具有此名称的作业已经存在.

The first time if I execute this statemachine, it will create a SageMaker batch transform job named example-jobname, but I need to exeucute this statemachine everyday, then it will give me error "error": "SageMaker.ResourceInUseException", "cause": "Job name must be unique within an AWS account and region, and a job with this name already exists .

原因是因为作业名被硬编码为example-jobname,所以如果状态机在第一次之后执行,由于作业名需要唯一,所以任务会失败,只是想知道如何添加一个字符串(类似于作业名称末尾的 ExecutionId).这是我尝试过的:

The cause is because the job name is hard-coded as example-jobname so if the state machine gets executed after the first time, since the job name needs to be unique, the task will fail, just wondering how I can add a string (something like ExecutionId at the end of the job name). Here's what I have tried:

  1. 我在 Parameters 中添加了 "executionId.$": "States.Format('somestring {}', $$.Execution.Id)" 部分在 json 文件中,但是当我执行任务时出现错误 错误":States.Runtime",原因":执行状态 'SageMaker 时发生错误CreateTransformJob'(在事件 ID #2 处输入).参数 '{"BatchStrategy":"SingleRecord",........"executionId":"somestring arn:aws:states:us-east-1:xxxxx:execution:xxxxx-state-machine:xxxxxxxx72950"}' 无法用于启动任务:[The field "executionId"Step Functions 不支持]"}

  1. I added "executionId.$": "States.Format('somestring {}', $$.Execution.Id)" in the Parameters section in the json file, but when I execute the task I got error "error": "States.Runtime", "cause": "An error occurred while executing the state 'SageMaker CreateTransformJob' (entered at the event id #2). The Parameters '{"BatchStrategy":"SingleRecord",.............."executionId":"somestring arn:aws:states:us-east-1:xxxxx:execution:xxxxx-state-machine:xxxxxxxx72950"}' could not be used to start the Task: [The field "executionId" is not supported by Step Functions]"}

我将json文件中的jobname修改为TransformJobName":example-jobname-States.Format('somestring {}', $$.Execution.Id)",,当我执行状态机时,它给了我错误:错误":SageMaker.AmazonSageMakerException",原因":检测到 2 个验证错误:值 'example-jobname-States.Format('somestring {}', $$.Execution.Id)' at 'transformJobName' 未能满足约束:成员必须满足正则表达式模式:^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62};'transformJobName' 处的值 'example-jobname-States.Format('somestring {}', $$.Execution.Id)' 未能满足约束:成员的长度必须小于或等于 63

I modified the jobname in the json file to "TransformJobName": "example-jobname-States.Format('somestring {}', $$.Execution.Id)",, when I execute the statemachine, it gave me error: "error": "SageMaker.AmazonSageMakerException", "cause": "2 validation errors detected: Value 'example-jobname-States.Format('somestring {}', $$.Execution.Id)' at 'transformJobName' failed to satisfy constraint: Member must satisfy regular expression pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}; Value 'example-jobname-States.Format('somestring {}', $$.Execution.Id)' at 'transformJobName' failed to satisfy constraint: Member must have length less than or equal to 63

我真的没有想法了,有人可以帮忙吗?非常感谢.

I really run out of ideas, can someone help please? Many thanks.

推荐答案

所以按照 文档,我们应该按照以下格式传递参数

So as per the documentation, we should be passing the parameters in the following format

        "Parameters": {
            "ModelName.$": "$$.Execution.Name",  
            ....
        },

如果你仔细看看,这是你的定义中缺少的东西,所以你的步骤函数定义应该如下所示:

If you take a close look this is something missing from your definition, So your step function definition should be something like below:

要么

      "TransformJobName.$": "$$.Execution.Id",

      "TransformJobName.$: "States.Format('mytransformjob{}', $$.Execution.Id)"

完整的状态机定义:

    {
        "Comment": "Defines the statemachine.",
        "StartAt": "Generate Random String",
        "States": {
            "Generate Random String": {
                "Type": "Task",
                "Resource": "arn:aws:lambda:eu-central-1:1234567890:function:randomstring",
                "ResultPath": "$.executionid",
                "Parameters": {
                "executionId.$": "$$.Execution.Id"
                },
                "Next": "SageMaker CreateTransformJob"
            },
        "SageMaker CreateTransformJob": {
            "Type": "Task",
            "Resource": "arn:aws:states:::sagemaker:createTransformJob.sync",
            "Parameters": {
            "BatchStrategy": "SingleRecord",
            "DataProcessing": {
                "InputFilter": "$",
                "JoinSource": "Input",
                "OutputFilter": "xxx"
            },
            "Environment": {
                "SAGEMAKER_MODEL_SERVER_TIMEOUT": "300"
            },
            "MaxConcurrentTransforms": 100,
            "MaxPayloadInMB": 1,
            "ModelName": "${model_name}",
            "TransformInput": {
                "DataSource": {
                    "S3DataSource": {
                        "S3DataType": "S3Prefix",
                        "S3Uri": "${s3_input_path}"
                    }
                },
                "ContentType": "application/jsonlines",
                "CompressionType": "Gzip",
                "SplitType": "Line"
            },
            "TransformJobName.$": "$.executionid",
            "TransformOutput": {
                "S3OutputPath": "${s3_output_path}",
                "Accept": "application/jsonlines",
                "AssembleWith": "Line"
            },    
            "TransformResources": {
                "InstanceType": "xxx",
                "InstanceCount": 1
            }
        },
            "End": true
        }
        }
    }

在上面的定义中,lambda 可以是一个函数,它解析我通过参数部分传递的执行 id arn:

In the above definition the lambda could be a function which parses the execution id arn which I am passing via the parameters section:

 def lambda_handler(event, context):
    return(event.get('executionId').split(':')[-1])

或者如果你不想传递执行id,它可以简单地返回随机字符串,如

Or if you dont wanna pass the execution id , it can simply return the random string like

 import string
 def lambda_handler(event, context):
    return(string.ascii_uppercase + string.digits)

您可以生成各种随机字符串或在 lambda 中生成任何内容并将其传递给转换作业名称.

you can generate all kinds of random string or do generate anything in the lambda and pass that to the transform job name.

这篇关于如何将 stepfunction executionId 解析为 SageMaker 批量转换作业名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆