如果记录顺序无关紧要,我可以使用单个Kinesis分片并行调用Lambda函数吗? [英] Can I invoke Lambda functions in parallel using a single Kinesis shard if record order doesn't matter?

查看:72
本文介绍了如果记录顺序无关紧要,我可以使用单个Kinesis分片并行调用Lambda函数吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个仅需要1个Kinesis分片带宽的应用程序,但是我需要并行执行许多lambda函数调用来跟上记录处理的步伐.我的记录大小偏高(其中一些侵犯了1000 KB的限制),但是由于我正在使用单个EC2实例填充流,因此传入速率仅为1 MB/s.由于每条记录都包含一个内部时间戳,因此我不关心按顺序处理它们.基本上,我需要迁移几个月的数据,我想并行进行.

I've got an application for which I only need the bandwidth of 1 Kinesis shard, but I need many lambda function invocations in parallel to keep up with the record processing. My record size is on the high end (some of them encroach on the 1000 KB limit), but the incoming rate is only 1 MB/s, as I'm using a single EC2 instance to populate the stream. Since each record contains an internal timestamp, I don't care about processing them in order. Basically I have several months' worth of data that I need to migrate, and I want to do it in parallel.

处理后的记录提供了可处理1000个并发客户端的数据库集群的记录,因此我以前的解决方案是将Kinesis流拆分为50个分片.但是,事实证明这很昂贵,因为我需要的所有碎片都是并行处理.我使用的带宽不到1%,因此不得不增加保留期.

The processed records provide records for a database cluster that can handle 1000 concurrent clients, so my previous solution was to split my Kinesis stream into 50 shards. However, this has proved expensive, since all I need the shards for is to parallelize the processing. I'm using less than 1% of the bandwidth, and I had to increase the retention period.

从长远来看,我想答案是将记录分成几部分,以使消耗时间不是生产时间的很大倍数.现在这不是一个选择,但是我意识到我在滥用系统.

Long term, I imagine the answer involves splitting my records up, so that the consumption time isn't such a huge multiple of the production time. That's not an option right now, but I realize I'm abusing the system slightly.

有没有办法让一个保留顺序的lambda函数与单分片Kinesis流相关联,并让它在一批记录上异步调用另一个lambda函数?然后,我可以使用单个Kinesis分片(或其他数据源),并且仍然可以享受大规模的并行处理.

Is there a way I can have one order-preserving lambda function associated with a single-shard Kinesis stream, and let it invoke another lambda function asynchronously on a batch of records? Then I could use a single Kinesis shard (or other data source) and still enjoy massively parallel processing.

我真正需要的只是Kinesis的Lambda事件源配置中的一个选项,说我不在乎保留这些记录的顺序".但是随后,我想在失败的执行过程中保持迭代器的位置变得更具挑战性.

Really all I need is an option in the Lambda Event Source configuration for Kinesis to say "I don't care about preserving order of these records." But then I suppose keeping up with the iterator position on failed executions becomes more of a challenge.

推荐答案

根据在AWS中工作的人,可以将多个Lambda函数附加到同一Kinesis流.就是说,我目前尚未对其进行测试.

According to somebody that works in AWS, it is possible to attach several Lambda functions to the same Kinesis stream. That said, I'm testing it with no success for now.

它工作正常.

这篇关于如果记录顺序无关紧要,我可以使用单个Kinesis分片并行调用Lambda函数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆