在dynamodb交叉分区中获取最后10个项目 [英] get last 10 items in dynamodb cross partitions

查看:62
本文介绍了在dynamodb交叉分区中获取最后10个项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表格,其中包含不同人发布的博客;主键是author + time;
如何查询按时间排序的最后4个博客?(获取blog6,blog3,blog5,blog4)

I've a table containing blogs posted by different persons;primary key is author+time; how to query last 4 blogs ordered by time?(get blog6,blog3,blog5,blog4)

如果我创建了全局二级索引(即:我创建了一个新的属性调用状态,将所有值都设置为确定),将状态+时间设置为主键

If I create a global secondary index(i.e.: I create a new attribute calling status,setting all values to "ok"),set status+time as primary key


我知道我可以解决我的问题。但是结果是:索引中的所有数据将仅存储在一个分区中

会导致任何弱点吗?

推荐答案

看起来您在正确的轨道上。您对查询绝对正确。它们只为您提供给定分区键的记录。

It looks like you're on the right track. You are absolutely right about queries. They only give you the records for a given partition key.

如果您需要按时间排序的数据,而不管分区键如何,那么您将需要使用全局二级索引。

If you need data ordered by time, regardless of partition key, then you will need to use a global secondary index.

关于在 Status 上创建GSI的想法是朝着正确方向迈出的一步,但不幸的是,您怀疑,这会对索引产生压力,因为它将迫使索引中的所有记录都位于同一分区中。这几乎破坏了DynamoDB的可伸缩性。

Your idea of creating a GSI on Status is a step in the right direction but unfortunately, as you suspected, it would create pressure on your index because it would force all records in the index to be in the same partition. This pretty much defeats the scalability of DynamoDB.

但是,一切并没有丢失。您可以创建一个属性,该属性可以粗略地表示记录的时间戳。一个示例可能是使用月份或一年中的某天。这将允许将记录放置在前者的12个分区中,或后者的365个分区中。折衷方案是您将需要几个查询来查找最新条目,而不是单个查询,尽管从性能角度来看会更好。

But all is not lost. You could create an attribute that is a coarse representation of the timestamp of your records. An example might be to use the month, or day of year. This would allow records to be placed in up 12 partitions for the former, or 365 partitions for the latter. The compromise is that you would need a couple of queries to find out the most recent entries instead of a single query, though performance-wise it would be much better.

还根据您的需求,另一种可能性是创建一个外部索引。也许您可以让系统保留最新创建的博客文章的缓存。创建新帖子时,它们将添加到缓存中。当旧帖子变旧时,它们就会从缓存中逐出。您确实必须解决持久性问题,但是如果需要,也可以通过扫描表来重建缓存。

Yet another possibility, depending on your needs, would be to create an external index. Maybe you could have your system keep a cache of the most recently created blog posts. As new posts are created they get added to the cache. As old posts get "old" they get evicted from the cache. You do have to resolve the persistence problem but you could also rebuild the cache if need be by scanning your table.

或者您可以事件地使用另一个Dynamo表(或关系表)数据库)以存储最新的博客文章。只要这组最近发布的信息相对较少,就可以了。

Or you could event use another Dynamo table (or a relational database) to store the most recent blog posts. As long as this set of recent posts is relatively small you should be fine.

这篇关于在dynamodb交叉分区中获取最后10个项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆