如何确定AWS kinesis流中的分区键总数? [英] How to decide total number of partition keys in AWS kinesis stream?

查看:148
本文介绍了如何确定AWS kinesis流中的分区键总数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在生产者-消费者Web应用程序中,为运动学流碎片创建分区键的思考过程应该是什么. 假设我有一个包含16个分片的运动流,我应该创建多少个分区键?它真的取决于分片的数量吗?

In a producer-consumer web application, what should be the thought process to create a partition key for a kinesis stream shard. Suppose, I have a kinesis stream with 16 shards, how many partition keys should I create? Is it really dependent on the number of shards?

推荐答案

分区(或哈希)键:从1开始,直到340282366920938463463374607607431768211455.可以说〜34020 * 10 ^ 34,为方便起见,我将省略10 ^ 34.

Partition (or Hash) Key: starts from 1 up to 340282366920938463463374607431768211455. Lets say ~34020 * 10^34, I will omit 10^34 for ease...

如果有30个均匀分布的分片,则每个分片应覆盖1134 * 10 ^ 34个哈希键.覆盖范围应该是这样的.

If you have 30 shards, uniformly divided, each should cover 1134 * 10^34 hash keys. The coverage should be like this.

Shard-00: 0 - 1134 Shard-01: 1135 - 2268 Shard-03: 2269 - 3402 Shard-04: 3403 - 4536 ... Shard-28: 30619 - 31752 Shard-29: 31753 - 32886 Shard-30: 32887 - 34020

Shard-00: 0 - 1134 Shard-01: 1135 - 2268 Shard-03: 2269 - 3402 Shard-04: 3403 - 4536 ... Shard-28: 30619 - 31752 Shard-29: 31753 - 32886 Shard-30: 32887 - 34020

如果您有3个消费者应用程序(侦听这30个分片),则每个应用程序都应侦听10个分片(最佳平衡).

And if you have 3 consumer applications (listening to these 30 shards) each should listen 10 shards (optimum balanced).

这也说明了Stream上的Merge和Split操作.

This also explains Merge and Split operations on a Stream.

  • 要合并2个分片,它们应覆盖相邻的哈希键.您无法合并Shard-03和Shard-29.
  • 您可以拆分任何碎片.如果您在中间拆分shard-00,则分发将像这样;

Shard-31: 0 - 567 Shard-32: 568 - 1134 Shard-01: 1135 - 2268 Shard-03: 2269 - 3402 Shard-04: 3403 - 4536 ... Shard-28: 30619 - 31752 Shard-29: 31753 - 32886 Shard-30: 32887 - 34020

Shard-31: 0 - 567 Shard-32: 568 - 1134 Shard-01: 1135 - 2268 Shard-03: 2269 - 3402 Shard-04: 3403 - 4536 ... Shard-28: 30619 - 31752 Shard-29: 31753 - 32886 Shard-30: 32887 - 34020

请参阅,Shard-00将不再接受新数据.放入具有相同分区键范围(与Shard-00)相同的Kinesis流中的新记录将放置在Shard-31或Shard-32下.

See, Shard-00 will no longer accept new data. The new records that are put in Kinesis stream with the same partition key range (as Shard-00) will be placed under Shard-31 or Shard-32.

在将数据发送到Kinesis(即生产者端)时,您不必担心数据将发送到哪个分片".发送随机数(或uuid或当前时间戳(以毫秒为单位))最有效地在分片上缩放和分发数据.除非您担心单个分片中的记录顺序,否则最好为put_record请求选择一个随机数/不断变化的分区键.

While sending data to Kinesis (ie. producer side), you should not worry about "which shard the data goes to". Sending a random number (or uuid, or current timestamp in millis) would be best for scaling and distributing the data effectively on shards. Unless you are worried about the ordering of records in a single shard, it is best to choose a random number/constantly changing partition key for put_record request.

在Java中,您可以使用"putRecordsRequestEntry.setPartitionKey(Long.toString(System.currentTimeMillis()))"或"putRecordRequest.setPartitionKey(Long.toString(System.currentTimeMillis()))"作为示例.

In Java you can use "putRecordsRequestEntry.setPartitionKey(Long.toString(System.currentTimeMillis()))" or "putRecordRequest.setPartitionKey(Long.toString(System.currentTimeMillis()))" can be examples.

这篇关于如何确定AWS kinesis流中的分区键总数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆