Kinesis 分区键始终位于同一个分片中 [英] Kinesis partition key falls always in the same shard

查看:25
本文介绍了Kinesis 分区键始终位于同一个分片中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 2 个分片的 kinesis 流,如下所示:

I have a kinesis stream with 2 shards that looks like this:

{
    "StreamDescription": {
        "StreamStatus": "ACTIVE",
        "StreamName": "my-stream",
        "Shards": [
            {
                "ShardId": "shardId-000000000001",
                "HashKeyRange": {
                    "EndingHashKey": "17014118346046923173168730371587",
                    "StartingHashKey": "0"
                },
            {
                "ShardId": "shardId-000000000002",
                "HashKeyRange": {
                    "EndingHashKey": "340282366920938463463374607431768211455",
                    "StartingHashKey": "17014118346046923173168730371588"
                },
        ]
    }
}

发送方设置一个分区,通常是一个 UUID.它总是落在上面的 shard-002 中,这使得系统没有负载平衡,因此不可扩展.

The sender side sets a partition that is usually a UUID. It always falls in shard-002 above which makes the system not load balanced and therefore not scalable.

作为旁注,kinesis 使用 md5sum 分配记录,然后将其发送到包含其范围内结果散列的分片.事实上,当我在我使用的 UUId 上测试它时,它们确实总是落在同一个分片中.

As side note, kinesis uses md5sum to assign a record and then send it to shard that contains the resulted hash in its range. In fact when i tested it on the UUId i used, they do fall always in the same shard.

echo -n 80f6302fca1e48e590b09af84f3150d3 | md5sum
4527063413b015ade5c01d88595eec11  

17014118346046923173168730371588 < 4527063413b015ade5c01d88595eec11 < 340282366920938463463374607431768211455

知道如何解决这个问题吗?

Any idea on how to solve this?

推荐答案

经过几个小时的调查,我找到了根本原因,同样是人为错误.在这里分享解决方案,即使这很简单,可以节省其他人可以花在上面的时间.

After a few hours of investigation, I found the root cause, again human errors. Sharing the solution here even if it's a simple to save the time someone else could spend on it.

问题是由于原始流的拆分方式引起的.当您使用一个分片拆分流时,您必须计算新子分片的起始哈希键.这个新的哈希键通常位于父分片哈希键范围的中间.

The problem arose due to the way the original stream was split. When you split a stream with one shard, you have to calculate the starting hash key of the new child shard. This new hash key is usually in the middle of the parent shard hash key range.

新创建的分片(父分片)将具有以下范围:

A newly created shard(the parent) will have the following range:

0 - 340282366920938463463374607431768211455

你太天真了,去你的 Windows 计算器复制粘贴这个340282366920938463463374607431768211455"然后除以 2.

So naively you go to your Windows calculator and copy paste this "340282366920938463463374607431768211455" and then divide it by 2.

我错过并且很容易错过的问题是,Windows 计算器实际上会在不让您知道的情况下截断数字.粘贴在计算器中的上述数字现在将是 "34028236692093846346337460743176" .一旦将其除以 2,您实际上会得到一个与父分片范围相比非常小的数字,然后您的记录将不会被分发,它们将转到获得该范围较大部分的分片.

The issue I missed and can easily be missed is the fact that the Windows calculator actually truncates number without letting you know. The above number pasted in the calculator will now be "34028236692093846346337460743176" . Once you divide it by 2 you will actually get a number that is very small compare to range of the parent shard, and then your records will not be distributed, they will go to the shard that got the bigger portion of the range.

一旦您将上面的数字用于适用于大数字的计算器,您就会得到范围的中间位置.我用它来计算范围:https://defuse.ca/big-number-calculator.htm .

Once you take the number above to calculator adapted for big numbers you will get right the middle of the range. I used this to calculate the range : https://defuse.ca/big-number-calculator.htm .

经过这次更改后,记录完美分布,系统扩展性很好.

After this change, the records are perfectly distributed and the system scales nicely.

这篇关于Kinesis 分区键始终位于同一个分片中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆