如何确定 AWS kinesis 流中的分区键总数? [英] How to decide total number of partition keys in AWS kinesis stream?

查看:19
本文介绍了如何确定 AWS kinesis 流中的分区键总数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在生产者-消费者 Web 应用程序中,为 kinesis 流分片创建分区键的思考过程应该是怎样的.假设,我有一个包含 16 个分片的 kinesis 流,我应该创建多少个分区键?真的依赖分片数量吗?

In a producer-consumer web application, what should be the thought process to create a partition key for a kinesis stream shard. Suppose, I have a kinesis stream with 16 shards, how many partition keys should I create? Is it really dependent on the number of shards?

推荐答案

Partition (or Hash) Key: 从 1 开始到 340282366920938463463374607431768211455.比如说~34020 * 10^34,我会省略.34^1.

Partition (or Hash) Key: starts from 1 up to 340282366920938463463374607431768211455. Lets say ~34020 * 10^34, I will omit 10^34 for ease...

如果您有 30 个分片,均匀划分,每个分片应覆盖 1134 * 10^34 个哈希键.覆盖范围应该是这样的.

If you have 30 shards, uniformly divided, each should cover 1134 * 10^34 hash keys. The coverage should be like this.

<代码>分片 00:0 - 1134分片 01:1135 - 2268分片 03:2269 - 3402分片 04:3403 - 4536...分片 28:30619 - 31752分片 29:31753 - 32886分片 30:32887 - 34020

如果您有 3 个消费者应用程序(监听这 30 个分片),每个应用程序应该监听 10 个分片(最佳平衡).

And if you have 3 consumer applications (listening to these 30 shards) each should listen 10 shards (optimum balanced).

这也解释了对流的合并和拆分操作.

This also explains Merge and Split operations on a Stream.

  • 要合并 2 个分片,它们应该覆盖相邻的哈希键.您不能合并 Shard-03 和 Shard-29.
  • 您可以拆分任何分片.如果在中间分割shard-00,分布会是这样;

<代码>分片 31:0 - 567分片 32:568 - 1134分片 01:1135 - 2268分片 03:2269 - 3402分片 04:3403 - 4536...分片 28:30619 - 31752分片 29:31753 - 32886分片 30:32887 - 34020

看,Shard-00 将不再接受新数据.放入具有相同分区键范围(如 Shard-00)的 Kinesis 流中的新记录将放置在 Shard-31 或 Shard-32 下.

See, Shard-00 will no longer accept new data. The new records that are put in Kinesis stream with the same partition key range (as Shard-00) will be placed under Shard-31 or Shard-32.

在将数据发送到 Kinesis(即生产者端)时,您不必担心数据去了哪个分片".发送随机数(或 uuid,或以毫秒为单位的当前时间戳)最适合在分片上有效地扩展和分发数据.除非您担心单个分片中记录的顺序,否则最好为 put_record 请求选择一个随机数/不断变化的分区键.

While sending data to Kinesis (ie. producer side), you should not worry about "which shard the data goes to". Sending a random number (or uuid, or current timestamp in millis) would be best for scaling and distributing the data effectively on shards. Unless you are worried about the ordering of records in a single shard, it is best to choose a random number/constantly changing partition key for put_record request.

在 Java 中,您可以使用 "putRecordsRequestEntry.setPartitionKey(Long.toString(System.currentTimeMillis()))" 或 "putRecordRequest.setPartitionKey(Long.toString(System.currentTimeMillis())))" 可以作为例子.

In Java you can use "putRecordsRequestEntry.setPartitionKey(Long.toString(System.currentTimeMillis()))" or "putRecordRequest.setPartitionKey(Long.toString(System.currentTimeMillis()))" can be examples.

这篇关于如何确定 AWS kinesis 流中的分区键总数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆