按日期范围查询dynamoDB [英] Query dynamoDB by date range

查看:235
本文介绍了按日期范围查询dynamoDB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个允许用户阅读书籍的应用程序.我正在使用DynamoDB来存储用户阅读的书籍的详细信息,并且计划使用DynamoDB中存储的数据来计算统计数据,例如趋势书籍,作者等.

I am developing an application that allows users to read books. I am using DynamoDB for storing details of the books that user reads and I plan to use the data stored in DynamoDB for calculating statistics, such as trending books, authors, etc.

我当前的模式如下:

user_id | timestamp | book_id | author_id 

user_id是分区键,时间戳是排序键.

user_id is the partition key, and timestamp is the sort key.

我遇到的问题是,使用这种模式,我只能查询 单个用户(分区键)已阅读的书籍的详细信息.这是我的要求之一.

The problem I am having is that, with this schema I am only able to query the details of the books that a single user (partition key) has read. That is one of the requirements for me.

另一个要求是查询在特定日期范围内创建的所有记录,例如:在过去7天中创建的记录.使用此架构,我无法运行此查询.

The other requirement is to query all the records that has been created in a certain date range, eg: records created in the past 7 days. With this schema, I am unable to run this query.

我研究了很多其他选项,但还没有找到一种创建架构的方式来允许我运行两个查询.

I have looked into so many other options, and haven't figured out a way to create a schema that would allow me to run both queries.

  • 检索单个用户阅读的书籍的记录(可以完成).
  • 检索最近x天所有用户阅读的书籍记录(无法执行).

我不想运行扫描,因为这样做会很昂贵,因此我考虑了使用GSI作为时间戳的选项,但是它要求我指定一个哈希键,因此我无法查询在2之间创建的所有记录.日期.

I do not want to run a scan, since It will be expensive and I looked into the option of using GSI for timestamp, but it requires me to specify a hash key, and therefore I cannot query all the records created between 2 dates.

推荐答案

一种幼稚的解决方案是创建一个GSI,该GSI在所有书籍和时间戳上使用恒定的哈希键作为范围键.这将允许您执行查询类型.

One naive solution would be to create a GSI with a constant hash key across all books and timestamp as a range key. This will allow you to perform your type of queries.

此方法的问题在于,它可能会成为缩放瓶颈,因为相同的哈希键意味着相同的节点.解决此问题的一种解决方法是进行分片:创建一组哈希键(例如:从1到10),并将该集合中的随机键分配给每本书.然后,当您进行查询时,您将需要进行10个查询并合并结果.您甚至可以动态设置此设置大小,以使其随数据缩放.

The problem with this approach is that it is likely to become a scaling bottleneck, as same hash key means same node. One workaround for this problem is to do sharding: create a set of hash keys (ex: from 1 to 10) and assign random key from this set to every book. Then when you make a query you will need to make 10 queries and merge results. You can even make this set size dynamic, so that it scales with your data.

我还建议针对此用例研究其他工具(不是DynamoDB),因为DDB并不是用于数据分析的最佳工具.例如,您可以将DynamoDB数据输入CloudSearch或ElasticSearch并在那里进行分析.

I would also suggest looking into other tools (not DynamoDB) for this use case, as DDB is not the best tool for data analysis. You might, for example, feed DynamoDB data into CloudSearch or ElasticSearch and do your analysis there.

这篇关于按日期范围查询dynamoDB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆