用于查询日期范围的 DynamoDB 架构 [英] DynamoDB schema for querying date range

查看:60
本文介绍了用于查询日期范围的 DynamoDB 架构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习使用 DynamoDB 表并存储一些包含发布日期、公司和职位等信息的招聘信息.

I'm learning to use DynamoDB table and storing some job postings with info like date posted, company, and job title.

我最常用的查询是获取所有大于 x 日期的招聘信息.

The query I use most is get all job posting greater than x date.

我应该使用什么分区键,以便在不使用扫描的情况下执行上述查询?

What partition key should I use so that I can do the above query without using a scan?

只能检查分区键是否相等,因此使用日期作为分区键是不好的.日期作为排序键似乎是最好的,因为我可以使用等式查询.

Partition key can only be checked for equality so using date as the partition key is no good. Date as the sort key seems best since I can query using equality on that.

然而,我对什么是好的分区键感到有些困惑.如果我输入公司或职位,我必须将其作为查询的一部分包含在内,但我希望在某个日期之后的所有职位发布,而不仅仅是针对特定公司或职位.

However I'm a bit stuck on what is a good partition key to use then. If I put company or job title, I would have to include that as part of my query but I want ALL job postings after a certain date not just for specific company or job.

我想到的一种方法是使用月份作为分区键,使用日期作为排序键.这样说过去 14 天我知道我需要点击本月甚至上个月的分区键.然后我可以使用排序键来保留过去 14 天内的记录.这似乎有点骇人听闻.

One way I thought of was using month as a partition key and date as the sort key. That way to get say last 14 days I know I need to hit the partition key of this month and maybe the last month. Then I can use the sort key to just keep the records within the last 14 days. This seems hackish tho.

推荐答案

我可能会做一些类似于您在上一段中提到的事情 - 保留日期的子部分作为分区键.要么使用月份之类的东西,要么使用 unix 时间戳的前 N ​​位数字,或类似的东西.

I would probably do something similar to what you mentioned in the last paragraph - keep a sub-part of the date as the partition key. Either use something like the month, or the first N digits of the unix timestamp, or something similar.

请注意,根据您选择的分区大小,您可能仍需要执行多个查询,例如,由于跨越分区边界(查询 1 月 4 日的最后 14 天),例如过去 14 天的帖子您还想查询上一年的 12 月等),但它应该仍然可用.

Note that, depending on how large partitions you choose you may still need to perform multiple queries when querying for, say, the last 14 days' of posts due to crossing partition boundaries (when querying for the last 14 days on January 4 you would want to query also for December of the previous year etc), but it should still be usable.

请记住,选择分区键很重要,以便尽可能均匀地分布项目,因此任何涉及共享相同分区键的大量(或者,有时会在关于 SO:ALL!的问题中看到!)项目的任何黑客攻击简化排序不是一个好主意.

Remember that it's important to choose the partition key so that items are as evenly distributed as possible, so any hacks involving a lot of (or, as is sometimes seen in questions on SO: ALL!) items sharing the same partition key to simplify sorting is not a good idea.

也许您可能还想看看 Time-to-live 让 AWS 在一定时间后自动删除项目.这样,您可以保留一张包含最新项目的表格,并将所有其他不经常查询的项目存档".当然,您也可以通过为新帖子和存档帖子保留单独的表格来手动执行类似的操作,但 TTL 非常适合自动过期项目.查询所有新帖子将只是对包含新帖子的表格进行全面扫描.

Perhaps you might also want to have a look at Time-to-live to have AWS automatically delete items after a certain amount of time. This way, you could keep one table of the newest items, and "archive" all other items which are not frequently queried. Of course you could also do something similar manually by keeping separate tables for new and archived posts, but TTL is pretty neat for auto-expirying items. Querying for all new posts would then simply be a full scan of the table with the new posts.

这篇关于用于查询日期范围的 DynamoDB 架构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆