AWS Kinesis ShardIteratorType TRIM_HORIZON 的预期行为 [英] Expected behavior for AWS Kinesis ShardIteratorType TRIM_HORIZON

查看:24
本文介绍了AWS Kinesis ShardIteratorType TRIM_HORIZON 的预期行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

上下文:我不一定指的是基于 KCL 的应用程序,只是纯粹的 Kinesis API 调用.

Context: I'm not necessarily referring to a KCL-based application, just pure Kinesis API calls.

使用 TRIM_HORIZON 分片迭代器类型是否会立即为您提供流中最早发布的记录(即在 Kinesis 的内置 24 小时窗口中最早可用的记录),或者只是一些迭代器/游标最多 24 小时前的时间段,然后您必须使用该时间段沿着溪流前进,直到达到最早发布的记录?

Does the using the TRIM_HORIZON shard iterator type immediately give you the earliest published record in the stream (ie earliest available within Kinesis' built-in 24hr window), or simply an iterator/cursor for some time period as much as 24 hours ago, that you must then use to advance along the stream until you hit the earliest published record?

换一种说法,以防万一不太清楚......

Put another way, in case that's not quite clear....

当使用 TRIM_HORIZON 的分片迭代器类型时,它会以返回 24 小时前可用的记录开始的预期行为,但如果零条记录在 24 小时前发布,并且而是仅在 3 小时前,您的应用程序将需要在到达 3 小时前发布的记录之前迭代轮询前 21 小时?

When using the shard iterator type of TRIM_HORIZON, is the expected behavior that it will begin with returning the records that were available 24 hours ago, BUT if zero records were published exactly 24 hours ago, and instead only 3 hours ago, that your application will need to iteratively poll through the previous 21 hours before it reaches the records published 3 hours ago?

时间轴示例:

  1. 9 月 29 日上午 5:00 - 使用 1 个分片创建流foo"
  2. 9 月 29 日上午 5:02 - 将单个记录Item=A"发布到foo"流
  3. 9 月 29 日上午 5:03 - 使用 TRIM_HORIZON 作为您的分片迭代器类型发出 GetShardIterator 调用,然后发出一个 GetRecords 调用分片迭代器并接收记录Item=A"
  4. 9 月 30 日上午 7:02 - 将第二条记录Item=B"发布到foo"流
  5. 9 月 30 日上午 7:03 - 使用 TRIM_HORIZON 作为您的分片迭代器类型发出 GetShardIterator 调用,然后发出一个 GetRecords 调用分片迭代器.这个调用的结果应该是什么? (注意:我们没有记住/重用步骤 3 中的分片迭代器)
  1. Sept 29 5:00 am - Create a stream "foo" with 1 shard
  2. Sept 29 5:02 am - Publish a single record, "Item=A", to the "foo" stream
  3. Sept 29 5:03 am - Issue a GetShardIterator call with TRIM_HORIZON as your shard iterator type, then issue a GetRecords call with that shard iterator and receive the record "Item=A"
  4. Sept 30 7:02 am - Publish a second record, "Item=B", to the "foo" stream
  5. Sept 30 7:03 am - Issue a GetShardIterator call with TRIM_HORIZON as your shard iterator type, then issue a GetRecords call with that shard iterator. What should be expected as the result from this call? (Note: we did not remember/re-use the shard iterator from step 3)

对于上面的第 5 步,Item=A"消息发布到流上已经超过 24 小时,而Item=B"发布仅一分钟.带有 TRIM_HORIZON 的新分片迭代器会立即为您提供最早的可用记录,还是您需要继续迭代直到达到某个时间段发布的内容?

For Step 5 above, it's been more than 24 hours since the "Item=A" message was published on the stream and only a minute since "Item=B" was published. Will a fresh shard iterator with TRIM_HORIZON immediately give you the earliest available record, or do you need to need to keep iterating until you hit a time period when something has been published?

我一直在试验 Kinesis,昨天或两天前一切正常(即我在发布和消费时没有任何问题).我对我的代码做了一些额外的修改,今天又开始发布了.当我启动我的消费者时,即使让它运行几分钟也没有任何结果.我尝试在完全相同的时间发布和消费,但仍然没有.在手动使用 AFTER_SEQUENCE_NUMBER 迭代器类型并使用几天前我的消费者日志中的一些序列号后,我能够访问我最近发布的消息.但是如果我回到使用 TRIM_HORIZON 类型,我根本看不到任何消息.

I'd been experimenting with Kinesis and everything was working fine yesterday or two days ago (ie. I was publishing AND consuming without any issues). I made some additional modifications to my code and began publishing again today. When I fired up my consumer, nothing was coming out at all even after letting it run for a few minutes. I tried publishing and consuming at exactly the same time, and still nothing. After manually playing with the AFTER_SEQUENCE_NUMBER iterator type, and using some sequence numbers from my consumer logs from a few days ago, I was able to reach my recently published messages. But then if I go back to using the TRIM_HORIZON type, I see no messages at all.

我看过文档,但大部分我发现的文档假设您正在使用 KCL(我最初实际上是在使用 KCL,但是当它开始失败时,我下降到原始 API 调用)并提到您必须有一个应用程序名称并且 DynamoDB 表用于跟踪状态.如果您使用的是纯 Kinesis API 调用或 Kinesis CLI(我最终尝试了这两种方法),那么我所能说的最好的说法是不正确的.我终于写了一个纯 API 脚本,以 TRIM_HORIZON 开始并无限轮询,最终它创下了新的记录(花费了大约 600 次迭代;开始时比现在"晚 14 小时,发现比现在"晚约 5 小时的记录").如果这是预期的行为,则似乎是文档中的 措辞 只是有点令人困惑/误导:

I've looked at the docs, but most of docs I found assume you are using the KCL (I actually was using KCL initially, but when it started failing I dropped down to raw API calls) and mention that you must have an application name and that DynamoDB tables are used for tracking state. Which as best I can tell is not true if you're using pure Kinesis API calls or the Kinesis CLI, both of which I eventually tried. I finally wrote a pure API script to start with TRIM_HORIZON and poll infinitely and eventually it hit new records (took ~600 iterations; started out 14hrs behind "now" and found records at about 5 hours behind "now"). If this is expected behavior, it seems like the wording in the docs is just a little confusing/misleading:

TRIM_HORIZON - 从分片中最后一个未修剪的记录开始读取在系统中,这是分片中最旧的数据记录.

TRIM_HORIZON - Start reading at the last untrimmed record in the shard in the system, which is the oldest data record in the shard.

我假设(现在似乎是错误的)术语最旧的数据记录"是指我发布到流中的记录,而不仅仅是流中的时间段.

I assumed (now seemingly incorrectly) that the terms "oldest data record" meant record that I've published into the stream, not simply a time period in the stream.

如果有人能帮助确认/解释我所看到的行为,那就太好了.

It'd be great if someone can help confirm/explain the behavior I'm seeing.

谢谢!

推荐答案

它在 TRIM HORIZON,或者流 TRIMming 发生的 HORIZON.

it's at the TRIM HORIZON, or the HORIZON where the stream TRIMming happens.

分片迭代器在调用时可能会得到 0 条记录,因此您需要不断迭代以到达最旧记录所在的区域(如果您不经常推送到流或有时间间隔).getRecords 将为您提供下一个可用于迭代的分片迭代器.

the shard iterator may get 0 records when called, so you'll need to keep iterating to reach the area where the oldest record is (if you push infrequently to the stream or have time gaps). the getRecords will give you the next shard iterator you can use to iterate.

来自文档:http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html

如果分片的部分没有可用的记录迭代器指向,GetRecords 返回一个空列表.请注意它可能需要多次调用才能到达碎片的一部分包含记录.

If there are no records available in the portion of the shard that the iterator points to, GetRecords returns an empty list. Note that it might take multiple calls to get to a portion of the shard that contains records.

这篇关于AWS Kinesis ShardIteratorType TRIM_HORIZON 的预期行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆