AWS-想要将多个文件上传到S3,并且仅在所有文件都上传后才触发lambda函数 [英] AWS - want to upload multiple files to S3 and only when all are uploaded trigger a lambda function

查看:388
本文介绍了AWS-想要将多个文件上传到S3,并且仅在所有文件都上传后才触发lambda函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻求有关设计此内容的最佳方法的建议-

I am seeking advice on what's the best way to design this -

用例

我想将多个文件放入S3。成功保存所有文件后,我想触发一个lambda函数来执行其他一些工作。

I want to put multiple files into S3. Once all files are successfully saved, I want to trigger a lambda function to do some other work.

天真的方法

我要解决的方法是在Dynamo中保存一条记录,该记录包含唯一的标识符以及要上传的记录总数以及S3中应该存在的键。

The way I am approaching this is by saving a record in Dynamo that contains a unique identifier and the total number of records I will be uploading along with the keys that should exist in S3.

一个基本实现是将我现有的lambda函数(该函数在我的S3存储桶写入时随时调用),并手动检查是否已保存所有其他文件。

A basic implementation would be to take my existing lambda function which is invoked anytime my S3 bucket is written into, and have it check manually whether all the other files been saved.

Lambda函数会知道(在Dynamo中查看以确定我们要寻找的内容),并查询S3以查看是否存在其他文件。如果是,请使用SNS触发我的

The Lambda function would know (look in Dynamo to determine what we're looking for) and query S3 to see if the other files are in. If so, use SNS to trigger my other lambda that will do the other work.

编辑:另一种方法是让我的客户端程序将文件放在S3中负责直接调用另一个lambda函数,因为从技术上讲,它知道何时所有文件都已已上传。这种方法的问题是我不希望这成为客户端程序的责任...我希望客户端程序不在乎。一旦上传了文件,它就应该能够退出。

Another approach is have my client program that puts the files in S3 be responsible for directly invoking the other lambda function, since technically it knows when all the files have been uploaded. The issue with this approach is that I do not want this to be the responsibility of the client program... I want the client program to not care. As soon as it has uploaded the files, it should be able to just exit out.

想法

我认为这不是一个好主意。主要是因为Lambda函数应该是轻量级的,并且从Lambda函数内部轮询数据库以获取所有已上传文件的S3密钥,然后检查S3是否存在-每次都执行此操作似乎是贫民窟,并且非常重复。

I don't think this is a good idea. Mainly because Lambda functions should be lightweight, and polling the database from within the Lambda function to get the S3 keys of all the uploaded files and then checking in S3 if they are there - doing this each time seems ghetto and very repetitive.

有什么更好的方法?我当时在想使用SWF之类的方法,但不确定这是否对我的解决方案有太大的帮助,还是可以让我做我想做的事情。该文档也没有显示真实的示例。这只是一个讨论,没有太多分步指南(也许我找错了地方)。

What's the better approach? I was thinking something like using SWF but am not sure if that's overkill for my solution or if it will even let me do what I want. The documentation doesn't show real "examples" either. It's just a discussion without much of a step by step guide (perhaps I'm looking in the wrong spot).

编辑 mbaird的建议如下-

Edit In response to mbaird's suggestions below-

选项1(SNS)。这很简单,并没有真正违反单一责任主体。也就是说,客户端上传文件并通过SNS发送通知其工作已完成。

Option 1 (SNS) This is what I will go with. It's simple and doesn't really violate the Single Responsibility Principal. That is, the client uploads the files and sends a notification (via SNS) that its work is done.

选项2(Dynamo流)因此,这实质上是选项1的另一个实现。客户端进行服务调用,在这种情况下,将导致表更新与SNS通知(选项1)。此更新将触发Lambda函数,而不是通知。不错的解决方案,但我更喜欢使用SNS进行通信,而不是依靠数据库的功能(在本例中为Dynamo流)来调用Lambda函数。

Option 2 (Dynamo streams) So this is essentially another "implementation" of Option 1. The client makes a service call, which in this case, results in a table update vs. a SNS notification (Option 1). This update would trigger the Lambda function, as opposed to notification. Not a bad solution, but I prefer using SNS for communication rather than relying on a database's capability (in this case Dynamo streams) to call a Lambda function.

无论如何,我使用的是AWS技术,并且已经与它们的产品(Lambda函数,SNS等)结合使用,但是我觉得依赖Dynamo流之类的东西是使其更紧密地耦合。对我的用例而言,并不是什么大问题,但仍然感到肮脏; D

In any case, I'm using AWS technologies and have coupling with their offering (Lambda functions, SNS, etc.) but I feel relying on something like Dynamo streams is making it an even tighter coupling. Not really a huge concern for my use case but still feels dirty ;D

带有S3触发器的选项3 条件。例如,如果客户端同时上传多个文件(以不同的文件大小同时触发多个异步上传,那么),如果两个文件恰好同时完成一次上传,并且两个或多个Lambda函数(或我们使用的任何实现)查询Dynamo并返回N作为已完成的上载(而不是N和N + 1)?现在,即使最终结果应该是N + 2,每个人也会将N加1。Nooooooooooo!

Option 3 with S3 triggers My concern here is the possibility of race conditions. For example, if multiple files are being uploaded by the client simultaneously (think of several async uploads fired off at once with varying file sizes), what if two files happen to finish uploading at around the same time, and two or more Lambda functions (or whatever implementations we use) query Dynamo and gets back N as the completed uploads (instead of N and N+1)? Now even though the final result should be N+2, each one would add 1 to N. Nooooooooooo!

所以选项1获胜。

推荐答案

如果您不希望客户端程序直接负责调用Lambda函数,那么做一些通用的事情就可以了吗?

If you don't want the client program responsible for invoking the Lambda function directly, then would it be OK if it did something a bit more generic?

选项1:(SNS),如果它只是通知SNS主题它已完成一批S3上传,该怎么办?您可以将Lambda函数订阅该SNS主题。

Option 1: (SNS) What if it simply notified an SNS topic that it had completed a batch of S3 uploads? You could subscribe your Lambda function to that SNS topic.

选项2:(DynamoDB流),如果它只是用某些内容更新了DynamoDB记录,该怎么办?就像属性 record.allFilesUploaded = true 。您可以让Lambda函数触发DynamoDB流 。由于您已经通过客户端创建了DynamoDB记录,因此这似乎是一种非常简单的方法,可以将一批上传标记为已完成,而无需编写关于下一步需要做什么的知识。然后,Lambda函数可以检查 allFilesUploaded属性,而不必每次调用时都进入S3以获得文件列表。

Option 2: (DynamoDB Streams) What if it simply updated the DynamoDB record with something like an attribute record.allFilesUploaded = true. You could have your Lambda function trigger off the DynamoDB stream. Since you are already creating a DynamoDB record via the client, this seems like a very simple way to mark the batch of uploads as complete without having to code in knowledge about what needs to happen next. The Lambda function could then check the "allFilesUploaded" attribute instead of having to go to S3 for a file listing every time it is called.

或者,不要插入DynamoDB记录直到所有文件都完成上传,然后您的Lambda函数才可以触发新记录的创建。

Alternatively, don't insert the DynamoDB record until all files have finished uploading, then your Lambda function could just trigger off new records being created.

选项3 :(继续使用S3触发器) 如果无法更改客户端程序的工作方式,则不必列出所有S3文件并在每次出现新文件时将它们与DynamoDB中的列表进行比较,只需通过< a href = http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithItems.html#WorkingWithItems.AtomicCounters rel = nofollow>原子计数器。然后将结果值与文件列表的大小进行比较。一旦值相同,您就知道所有文件都已上传。缺点是您需要在DynamoDB表上提供足够的容量来处理所有更新,这将增加您的成本。

Option 3: (continuing to use S3 triggers) If the client program can't be changed from how it works today, then instead of listing all the S3 files and comparing them to the list in DynamoDB each time a new file appears, simply update the DynamoDB record via an atomic counter. Then compare the result value against the size of the file list. Once the values are the same you know all the files have been uploaded. The down side to this is that you need to provision enough capacity on your DynamoDB table to handle all the updates, which is going to increase your costs.

此外,我同意与您一起认为SWF不能胜任这项任务。

Also, I agree with you that SWF is overkill for this task.

这篇关于AWS-想要将多个文件上传到S3,并且仅在所有文件都上传后才触发lambda函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆