在 AWS Step Functions 中传递变量 [英] Pass Variables in AWS Step Functions

查看:31
本文介绍了在 AWS Step Functions 中传递变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 AWS Step Functions 和 AWS Lambda Functions 非常陌生,真的可以使用一些帮助.我有一个状态机,我试图在其中检查某个文件是否存在于我的 S3 存储桶中,然后如果文件存在则让状态机遵循一个路径,如果文件不存在则使用不同的路径.

I am very new to AWS Step Functions and AWS Lambda Functions and could really use some help. I have a state machine where I am trying to check if a certain file exists in my S3 Bucket then have the state machine follow one path if the file exists and a different path if it does not exist.

以下显示了我的状态机代码的开头,涵盖了这个问题

The following shows the beginning of my State Machine code the covers this issue

{
  "Comment": "This is a test for running the structure of the CustomCreate job.",
  "StartAt": "PreStep",
  "States": {
    "PreStep": {
      "Comment": "Check that all the necessary files exist before running the job.",
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:XXXXXXXXXX:function:CustomCreate-PreStep-Function",
      "Next": "Run Job Choice"
    },
    "Run Job Choice": {
      "Comment": "This step chooses whether or not to go forward with running the main job.",
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.FoundNecessaryFiles",
          "BooleanEquals": true,
          "Next": "Spin Up Cluster"
        },
        {
          "Variable": "$.FoundNecessaryFiles",
          "BooleanEquals": false,
          "Next": "Do Not Run Job"
        }
      ]
    },
    "Do Not Run Job": {
      "Comment": "This step triggers if the PreStep fails and the job should not run.",
      "Type": "Fail",
      "Cause": "PreStep unsuccessful"
    },
    "Spin Up Cluster": {
      "Comment": "Spins up the EMR Cluster.",
      "Type": "Pass",
      "Next": "Update Env"
    },
    "Update Env": {
      "Comment": "Update the environment variables in the EMR Cluster.",
      "Type": "Pass",
      "Next": "Run Job"
    },
    "Run Job": {
      "Comment": "Add steps to the EMR Cluster.",
      "Type": "Pass",
      "End": true
    }
  }
}

以下代码描述了我的CustomCreate-PreStep-Function Lambda 函数

The following code depicts my CustomCreate-PreStep-Function Lambda Function

exports.handler = async function(event, context, callback) {
     var AWS = require('aws-sdk');
     var s3 = new AWS.S3();
     var params = {Bucket: 'BUCKET_NAME', Key: 'FILE_NAME'};
     s3.getObject(params, function(err, data) {

        if (err) {
            console.log(err, err.stack);
            // file does not exist
            console.log("failed");
            callback(null,false);
        }
        else {
            console.log(data);
            //file exist
            console.log("succeeded");
            var FoundNecessaryFiles = true;
            // return FoundNecessaryFiles;
            callback(null,event.FoundNecessaryFiles=true);
        }    
    });
};

我已经尝试了多种方法,但一直无法使其正常工作.如您所见,我正在尝试使用 Lambda 函数将变量 FoundNecessaryFiles 以真/假状态传回,具体取决于是否找到了文件,并使用该值来指导选择下一步.最好通过修复变量回传来解决这个问题,因为我可能需要稍后在状态机中再次使用该方法,但我也愿意接受另一种解决方案,无论是组合步骤还是其他任何可行的方法.

I have tried this in a number of ways but have been unable to get it working. As you can see, I am trying to use the Lambda Function to pass back the variable FoundNecessaryFiles with a state of true/false depending on whether the file was found and use that value to guide the choice in the next step. It would be preferable to solve this by fixing the variable pass back as I may need to use that method again later in the state machine, but I would also be willing to accept another solution, whether it be combining the steps or whatever else may work.

另外,我在这个过程中的下一步将是启动一个 AWS EMR 集群,前提是存在适当的文件,我也很不清楚如何完成.如果有人能够在使用 Step Functions 运行 AWS EMR 集群方面提供任何帮助,我将不胜感激.

Also, my next steps in this process will be to spin up an AWS EMR Cluster provided that the proper files exist, which I am also very unclear on how to accomplish. I would be very appreciative if anyone were able to able to provide any assistance in running an AWS EMR Cluster using Step Functions as well.

推荐答案

我解决了最初传递变量的问题,但是,我仍然可以使用一些帮助来让 EMR 集群通过 Step Functions 运行.

I solved my initial problem of passing the variable, however, I could still really use some help getting an EMR Cluster running through Step Functions.

对于那些可能遇到类似问题的人,我通过将我的 Lambda 函数更改为以下内容来解决我的变量传递问题

For those of you who may encounter a similar problem, I solved my variable passing issue by changing my Lambda Function to the following

exports.handler = function(event, context, callback) {
     var AWS = require('aws-sdk');
     var s3 = new AWS.S3();
     var params = {Bucket: 'BUCKET_NAME', Key: 'FILE_NAME'};
     s3.getObject(params, function(err, data) {

        if (err) {
            console.log(err, err.stack);
            // file does not exist
            console.log("failed");
            event.FoundNecessaryFiles = false;
            callback(null,event);
        }
        else {
            console.log(data);
            //file exist
            console.log("succeeded");
            event.FoundNecessaryFiles = true;
            callback(null,event);
        }    
    });
};

我的下一个问题是设置 AWS EMR 集群.我目前的第一个任务是启动一个 EMR 集群,这可以通过直接使用 Step Function JSON 来完成,或者最好使用我在 S3 存储桶中找到的 JSON 集群配置文件.我的下一个任务是更新 EMR 集群环境变量,我的 S3 存储桶上有一个 .sh 脚本可以执行此操作,我只是不知道如何使用 Step Functions 将其应用于 EMR 集群.我的第三个任务是向 EMR 集群添加一个包含 spark-submit 命令的步骤,此命令在我的 S3 存储桶上的 JSON 配置文件中进行了描述,该文件可以以类似于上传环境配置的方式上传到 EMR 集群上一步中的文件.最后,我希望有一个任务来确保 EMR 集群在完成运行后终止.

My next issue is to set up an AWS EMR Cluster. My current first task is to spin up an EMR Cluster, this could be done through directly using the Step Function JSON, or preferably, using a JSON Cluster Config file I have located on my S3 Bucket. My next task is to update the EMR Cluster environment variables, I have a .sh script located on my S3 Bucket that can do this, I just do not know how to apply this to the EMR Cluster using the Step Functions. My third task is to add a step that contains a spark-submit command to the EMR Cluster, this command is described in a JSON config file on my S3 Bucket that can be uploaded to the EMR Cluster in a similar manner to uploading the environment configs file in the previous step. Finally, I want to have a task that makes sure the EMR Cluster terminates after it completes its run.

我们将不胜感激,无论您是遵循我上面概述的结构还是知道改变结构的解决方案,我都会很乐意提供任何帮助.

Any help on this would be greatly appreciated, whether you follow the structure I outlined above or know of a solution that alters the structure, I would be happy to take any assistance.

这篇关于在 AWS Step Functions 中传递变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆