AWS DynamoDB v2:我是否需要替代查询的二级索引? [英] AWS DynamoDB v2: Do I need secondary index for alternative queries?

查看:141
本文介绍了AWS DynamoDB v2:我是否需要替代查询的二级索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要创建一个表,其中包含由连续运行的进程生成的一片数据。此过程生成包含两个必需组件的消息,其中包括:全局唯一消息UUID和消息时间戳。

I need to create a table that would contain a slice of data produced by a continuously running process. This process generates messages that contain two mandatory components, among other things: a globally unique message UUID, and a message timestamp.

这些消息稍后将由UUID检索。

Those messages would be later retrieved by the UUID.

此外,我需要定期删除该表中过于陈旧的所有消息,即其时间戳距离当前时间X以上。

In addition, on a regular basis I would need to delete all messages from that table that are too old, i.e. whose timestamps are more than X away from the current time.

我一直在阅读DynamoDB v2文档(例如本地二级索引)试图找出如何组织我的表以及我是否需要二级索引来执行搜索要删除的邮件。我的问题可能有一个简单的答案,但我有点困惑......

I've been reading the DynamoDB v2 documentation (e.g. Local Secondary Indexes) trying to figure out how to organize my table and whether or not I need a secondary index to perform searches for messages to delete. There might be a simple answer to my question, but I am somehow confused...

所以我应该创建一个以UUID作为哈希和messageTimestamp的表作为范围键(以及包含实际消息的消息属性),然后不创建任何二级索引?在我看过的例子中,哈希是不唯一的(例如上面链接下的ForumName)。在我的例子中,哈希将是唯一的。我不确定是否有任何区别。

So should I just create a table with the UUID as the hash and messageTimestamp as the range key (together with a "message" attribute that would contain the actual message), and then not create any secondary indices? In the examples that I've seen, the hash was something that was not unique (e.g. ForumName under the above link). In my case, the hash would be unique. I am not sure whether than makes any difference.

如果我按照描述创建带有散列和范围的表,并且没有辅助索引,那么我将如何查询某个时间范围内的所有消息,而不管他们的UUID?

And if I create the table with hash and range as described, and without a secondary index, then how would I query for all messages that are in a certain timerange regardless of their UUIDs?

推荐答案

我们也在努力解决这个问题。我们提出的最佳解决方案是创建第二个表来存储时间序列数据。为此:

We've wrestled with this as well. The best solution we've come up with is to create second table for storing the time series data. To do this:

1)使用日期加桶ID作为哈希键

你可以使用日期,但我猜今天的日期会成为一个热门的关键 - 一个频率过高的关键字。这可能会造成严重的瓶颈,因为特定DynamoDB分区的总吞吐量等于总预配置吞吐量除以分区数 - 这意味着如果所有写入都是针对单个密钥(今天的密钥)并且您具有吞吐量每秒20次写入,然后有20个分区,您的总吞吐量将是每秒1次写入。超出此范围的任何请求都将受到限制。不是很好的情况。

1) Use the date plus "bucket" id for a hash key
You could just use the date, but then I'm guessing today's date would become a "hot" key - one that is written with excessive frequency. This can create a serious bottleneck, as the total throughput for a particular DynamoDB partition is equal to the total provisioned throughput divided by the number of partitions - that means if all your writes are to a single key (today's key) and you have a throughput of 20 writes per second, then with 20 partitions, your total throughput would be 1 write per second. Any requests beyond this would be throttled. Not a good situation.

存储桶可以是从1到n的随机数,其中n应该大于底层数据库使用的分区数。当然,确定n有点棘手,因为Dynamo没有透露它使用了多少个分区。但我们目前正在根据找到的示例处理上限200 此处。这个链接的写作是我们提出这种方法的基础。

The bucket can be a random number from 1 to n, where n should be greater than the number of partitions used by the underlying DB. Determining n is a bit tricky of course because Dynamo does not reveal how many partitions it uses. But we are currently working with the upper limit of 200 based on the example found here. The writeup at this link was the basis for our thinking in coming up with this approach.

2)使用UUID作为范围键

3)通过发出每天和每桶的查询来查询记录。
这可能看起来很乏味,但它比全扫描。另一种可能性是使用Elastic Map Reduce工作,但我还没有尝试过,所以不能说它是多么容易/有效。

3) Query records by issuing queries for each day and bucket. This may seem tedious, but it is more efficient than a full scan. Another possibility is to use Elastic Map Reduce jobs, but I have not tried that myself yet so cannot say how easy/effective it is to work with.

我们仍在计算我们自己这样做,所以我很想听听别人的评论。我还发现这个演示文稿非常有助于思考如何最好地使用Dynamo:
与发电机坠入爱河

We are still figuring this out ourselves, so I'm interested to hear others' comments. I also found this presentation very helpful in thinking through how best to use Dynamo: Falling In and Out Of Love with Dynamo

-John

这篇关于AWS DynamoDB v2:我是否需要替代查询的二级索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆