如何在多个记录处理器之间平衡kinesis碎片? [英] How to balance kinesis shards across several record processor?

查看:260
本文介绍了如何在多个记录处理器之间平衡kinesis碎片?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用Golang版本编写简单的Kinesis客户端库(KCL)。我希望为我的简单KCL提供的功能之一是跨多个记录处理器和EC2实例的负载均衡分片。例如,我有两个记录处理器(将在单独的EC2实例中运行)和四个Kinesis碎片。负载平衡功能将允许每个记录处理器处理两个Kinesis碎片

我读到Java KCL实现了这一点,但我无法在库中找到实现。我的问题是我将如何在Golang中实现此功能?谢谢。

解决方案

KCL已经为您做了负载平衡。 (请记住,这仅仅是基础知识,并且随着亚马逊改进逻辑而改变):


  • 当工作人员(可能会处理多个分片)启动时,它会检查中央DynamoDB数据库中哪些分片由工作人员拥有(如果需要,可以创建该数据库)。这是租赁表。


    • 租赁是工作人员与分部之间的关系

    • 工人将处理其拥有的碎片的记录如果工作人员尚未在租约到期前发出心跳(通常每隔几秒),那么租约将失效
    • li>

  • 它检查Kinesis流中是否有可用的分片,并根据需要更新表

  • 租约已过期,工作人员将尝试取得租赁所有权 - 在数据库级别,使用shardId作为密钥并在其中写入workerId。
  • 如果工人启动并且所有碎片已被占用,它会检查平衡是什么 - 如果它检测到不平衡(即:我拥有0个碎片和其他工人拥有10个碎片),它会启动一个窃取碎片协议 - 旧工作人员停止处理该碎片和新工人开始



你当然是fre e在github上检查KCL的源代码: https://github.com/awslabs/amazon -kinesis-client - 希望这个解释能为你提供更多关于如何理解KCL并使之适应你需求的内容。


I am currently writing the simple Kinesis Client Library (KCL) in Golang version. One of the features that I want it for my simple KCL is load balancing shards across multiple record processors and EC2 instances. For example, I have two record processors (which will run in the separate EC2 instance) and four Kinesis shards. The load balancing feature will allow each record processors to process two Kinesis shards.

I read that Java KCL implemented this but I can't find the implementation in the library. My question is how am I going to implement this feature in Golang? Thank you.

解决方案

KCL already does load balancing for you.

Here's a basic description of how it works today (keep in mind this is just the basics and is subject to change as Amazon improves the logic):

  • When a worker (which may process multiple shards) starts up, it checks a central DynamoDB database for which shards are owned by workers (creating that database if necessary). This is the "leasing" table.
    • A "lease" is a relationship between a worker and a shard
    • Workers will process records for shards it owns an unexpired lease for
    • Leases expire if the worker hasn't emitted a "heartbeat" for the lease before it expires (typically every few seconds) - this heartbeat essentially updates the DDB record
  • It checks Kinesis stream for which shards are available, and updates the table if needed
  • If any leases are expired, the worker will try to take ownership of the lease - at database level, use shardId as key and write it's workerId there.
  • If a worker starts and all shards are already taken, it checks to see what the "balance" is - if it detects an imbalance (ie: "I own 0 shards and some other worker owns 10 shards"), it initiates a "steal shard" protocol - the old worker stops processing for that shard, and the new worker starts

You're of course free to examine the source code for KCL on github: https://github.com/awslabs/amazon-kinesis-client - hopefully this explanation gives you more context for how to understand KCL and adapt it to your needs.

这篇关于如何在多个记录处理器之间平衡kinesis碎片?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆