AWS Kinesis 中的分区键是什么? [英] What is partition key in AWS Kinesis all about?

查看:36
本文介绍了AWS Kinesis 中的分区键是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读有关 AWS Kinesis 的内容.在下面的程序中,我将数据写入名为 TestStream 的流中.这段代码我运行了 10 次,在流中插入了 10 条记录.

I was reading about AWS Kinesis. In the following program, I write data into the stream named TestStream. I ran this piece of code 10 times, inserting 10 records into the stream.

var params = {
    Data: 'More Sample data into the test stream ...',
    PartitionKey: 'TestKey_1',
    StreamName: 'TestStream'
};

kinesis.putRecord(params, function(err, data) {
   if (err) console.log(err, err.stack); // an error occurred
   else     console.log(data);           // successful response
});

所有记录插入成功.partition key 在这里的真正含义是什么?它在后台做什么?我阅读了它的文档,但不明白是什么它的意思.

All the records were inserted successfully. What does partition key really mean here? What is it doing in the background? I read its documentation but did not understand what it meant.

推荐答案

分区键仅在流中有多个分片时才重要(但它们始终是必需的).Kinesis 计算分区键的 MD5 哈希值以决定将记录存储在哪个分片上(如果您描述流,您将看到哈希范围作为分片描述的一部分).

Partition keys only matter when you have multiple shards in a stream (but they're required always). Kinesis computes the MD5 hash of a partition key to decide what shard to store the record on (if you describe the stream you'll see the hash range as part of the shard decription).

那么为什么这很重要?

每个分片只能接受 1,000 条记录和/或每秒 1 MB(请参阅 PutRecord 文档).如果您以高于此速率的速度写入单个分片,您将收到 ProvisionedThroughputExceededException.

Each shard can only accept 1,000 records and/or 1 MB per second (see PutRecord doc). If you write to a single shard faster than this rate you'll get a ProvisionedThroughputExceededException.

使用多个分片,您可以扩展此限制:4 个分片可为您提供 4,000 条记录和/或每秒 4 MB.当然,也有一些警告.

With multiple shards, you scale this limit: 4 shards gives you 4,000 records and/or 4 MB per second. Of course, there are caveats.

最大的是你必须使用不同的分区键.如果您的所有记录都使用相同的分区键,那么您仍在写入单个分片,因为它们都将具有相同的哈希值.您如何解决此问题取决于您的应用程序:如果您从多个进程写入,那么使用进程 ID、服务器的 IP 地址或主机名可能就足够了.如果您从单个进程写入,那么您可以使用记录中的信息(例如,唯一的记录 ID)或生成随机字符串.

The biggest is that you must use different partition keys. If all of your records use the same partition key then you're still writing to a single shard, because they'll all have the same hash value. How you solve this depends on your application: if you're writing from multiple processes then it might be sufficient to use the process ID, server's IP address, or hostname. If you're writing from a single process then you can either use information that's in the record (for example, a unique record ID) or generate a random string.

第二个警告是分区键计入总写入大小,并存储在流中.因此,虽然您可以通过在记录中使用一些文本组件来获得良好的随机性,但您会浪费空间.另一方面,如果您有一些随机文本组件,您可以从中计算出自己的哈希值,然后将其字符串化为分区键.

Second caveat is that the partition key counts against the total write size, and is stored in the stream. So while you could probably get good randomness by using some textual component in the record, you'd be wasting space. On the other hand, if you have some random textual component, you could calculate your own hash from it and then stringify that for the partition key.

最后,如果您使用 PutRecords(如果您要写入大量数据,则应该这样做),请求中的个别记录可能会被拒绝,而其他记录会被接受.发生这种情况是因为这些记录进入了一个已经达到写入限制的分片,您必须重新发送它们(延迟后).

Lastly, if you're using PutRecords (which you should, if you're writing a lot of data), individual records in the request may be rejected while others are accepted. This happens because those records went to a shard that was already at its write limits, and you have to re-send them (after a delay).

这篇关于AWS Kinesis 中的分区键是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆