如果记录顺序无关紧要,我可以使用单个 Kinesis 分片并行调用 Lambda 函数吗? [英] Can I invoke Lambda functions in parallel using a single Kinesis shard if record order doesn't matter?

查看:27
本文介绍了如果记录顺序无关紧要,我可以使用单个 Kinesis 分片并行调用 Lambda 函数吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个应用程序,我只需要 1 个 Kinesis 分片的带宽,但我需要并行调用许多 lambda 函数来跟上记录处理.我的记录大小处于高端(其中一些超过 1000 KB 的限制),但传入速率仅为 1 MB/s,因为我使用单个 EC2 实例来填充流.由于每条记录都包含一个内部时间戳,因此我不关心按顺序处理它们.基本上,我有几个月的数据需要迁移,我想并行迁移.

I've got an application for which I only need the bandwidth of 1 Kinesis shard, but I need many lambda function invocations in parallel to keep up with the record processing. My record size is on the high end (some of them encroach on the 1000 KB limit), but the incoming rate is only 1 MB/s, as I'm using a single EC2 instance to populate the stream. Since each record contains an internal timestamp, I don't care about processing them in order. Basically I have several months' worth of data that I need to migrate, and I want to do it in parallel.

处理后的记录为可以处理 1000 个并发客户端的数据库集群提供记录,因此我之前的解决方案是将 Kinesis 流拆分为 50 个分片.然而,事实证明这很昂贵,因为我只需要分片来并行处理.我使用的带宽不到 1%,我不得不增加保留期.

The processed records provide records for a database cluster that can handle 1000 concurrent clients, so my previous solution was to split my Kinesis stream into 50 shards. However, this has proved expensive, since all I need the shards for is to parallelize the processing. I'm using less than 1% of the bandwidth, and I had to increase the retention period.

从长远来看,我认为答案涉及将我的记录拆分,以便消耗时间不会是生产时间的如此大的倍数.这不是目前的选择,但我意识到我在轻微滥用系统.

Long term, I imagine the answer involves splitting my records up, so that the consumption time isn't such a huge multiple of the production time. That's not an option right now, but I realize I'm abusing the system slightly.

有没有一种方法可以让一个保留顺序的 lambda 函数与单分片 Kinesis 流相关联,并让它在一批记录上异步调用另一个 lambda 函数?然后我可以使用单个 Kinesis 分片(或其他数据源)并且仍然享受大规模并行处理.

Is there a way I can have one order-preserving lambda function associated with a single-shard Kinesis stream, and let it invoke another lambda function asynchronously on a batch of records? Then I could use a single Kinesis shard (or other data source) and still enjoy massively parallel processing.

我真正需要的只是用于 Kinesis 的 Lambda 事件源配置中的一个选项,用于说明我不关心保留这些记录的顺序".但是我想在失败的执行中保持迭代器的位置变得更加困难.

Really all I need is an option in the Lambda Event Source configuration for Kinesis to say "I don't care about preserving order of these records." But then I suppose keeping up with the iterator position on failed executions becomes more of a challenge.

推荐答案

根据在 AWS 中工作的人,可以将多个 Lambda 函数附加到同一个 Kinesis 流.也就是说,我目前正在测试它,但没有成功.

According to somebody that works in AWS, it is possible to attach several Lambda functions to the same Kinesis stream. That said, I'm testing it with no success for now.

它工作正常.

这篇关于如果记录顺序无关紧要,我可以使用单个 Kinesis 分片并行调用 Lambda 函数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆