Kinesis分区密钥始终位于同一碎片中 [英] Kinesis partition key falls always in the same shard

查看:101
本文介绍了Kinesis分区密钥始终位于同一碎片中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有2个分片的运动流,如下所示:

I have a kinesis stream with 2 shards that looks like this:

{
    "StreamDescription": {
        "StreamStatus": "ACTIVE",
        "StreamName": "my-stream",
        "Shards": [
            {
                "ShardId": "shardId-000000000001",
                "HashKeyRange": {
                    "EndingHashKey": "17014118346046923173168730371587",
                    "StartingHashKey": "0"
                },
            {
                "ShardId": "shardId-000000000002",
                "HashKeyRange": {
                    "EndingHashKey": "340282366920938463463374607431768211455",
                    "StartingHashKey": "17014118346046923173168730371588"
                },
        ]
    }
}

发送方设置一个通常为UUID的分区.它总是落在shard-002之上,这使系统无法实现负载平衡,因此无法扩展.

The sender side sets a partition that is usually a UUID. It always falls in shard-002 above which makes the system not load balanced and therefore not scalable.

作为旁注,kinesis使用md5sum分配记录,然后将其发送到包含结果范围内的哈希值的分片.实际上,当我在所使用的UUId上对其进行测试时,它们的确始终处于同一碎片中.

As side note, kinesis uses md5sum to assign a record and then send it to shard that contains the resulted hash in its range. In fact when i tested it on the UUId i used, they do fall always in the same shard.

echo -n 80f6302fca1e48e590b09af84f3150d3 | md5sum
4527063413b015ade5c01d88595eec11  

17014118346046923173168730371588 < 4527063413b015ade5c01d88595eec11 < 340282366920938463463374607431768211455

关于如何解决此问题的任何想法?

Any idea on how to solve this?

推荐答案

经过几个小时的调查,我发现了根本原因,再次是人为错误.即使可以很轻松地节省别人在此上面花费的时间,也可以在这里共享解决方案.

After a few hours of investigation, I found the root cause, again human errors. Sharing the solution here even if it's a simple to save the time someone else could spend on it.

由于原始流的分割方式而出现问题.使用一个分片拆分流时,必须计算新的子分片的起始哈希键.这个新的哈希键通常位于父分片哈希键范围的中间.

The problem arose due to the way the original stream was split. When you split a stream with one shard, you have to calculate the starting hash key of the new child shard. This new hash key is usually in the middle of the parent shard hash key range.

新创建的分片(父级)将具有以下范围:

A newly created shard(the parent) will have the following range:

0 - 340282366920938463463374607431768211455

因此,您天真地转到Windows计算器并复制粘贴此"340282366920938463463374374607431768211455",然后将其除以2.

So naively you go to your Windows calculator and copy paste this "340282366920938463463374607431768211455" and then divide it by 2.

我错过了并且很容易错过的问题是Windows计算器实际上在不通知您的情况下将数字截断了.上面粘贴在计算器中的数字现在将是"34028236692093893846346337460743176".一旦将其除以2,您实际上会得到一个与父分片范围相比很小的数字,然后您的记录将不会被分发,它们将进入获得更大范围范围的分片.

The issue I missed and can easily be missed is the fact that the Windows calculator actually truncates number without letting you know. The above number pasted in the calculator will now be "34028236692093846346337460743176" . Once you divide it by 2 you will actually get a number that is very small compare to range of the parent shard, and then your records will not be distributed, they will go to the shard that got the bigger portion of the range.

一旦将上面的数字带到适合大数字的计算器上,您就会在范围的中间. 我用它来计算范围: https://defuse.ca/big-number-calculator.htm .

Once you take the number above to calculator adapted for big numbers you will get right the middle of the range. I used this to calculate the range : https://defuse.ca/big-number-calculator.htm .

更改之后,记录将完美分配,并且系统可以很好地扩展.

After this change, the records are perfectly distributed and the system scales nicely.

这篇关于Kinesis分区密钥始终位于同一碎片中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆