NoSQL:从表DynamoDB / Azure表存储获取最新值 [英] NoSQL: Getting the latest values from tables DynamoDB/Azure Table Storage

查看:83
本文介绍了NoSQL:从表DynamoDB / Azure表存储获取最新值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个小问题,需要一些建议:




  • 让我们说我们有几百个数据表和几千万行

  • 数据表是时间戳(键)-值

  • 数据表每秒写入一次



每个表的最新条目应该可以快速获取,并且很可能会查询到最多的
(类似于实时关注数据)。由于缺少 Last()或类似的名称,我正在考虑创建另一个表 LatestValues,其中每个数据表的最新条目都进行了更新以加快检索速度。但是,这将为每个写操作添加额外的更新。同样,大多数流量将集中在此表上(好/坏?)。



也可以说我们要查询数据表中的值。



既然显然扫描是不可能的,那么通过复制数据来创建二级索引,有效地使存储需求和写操作数量加倍,剩下的唯一选择是吗?还有其他解决方案吗?



我主要关注DynamoDB和Azure Table Storage,但我也很好奇BigTable如何处理此问题。

解决方案

我今天刚刚发表了一篇文章,其中包含一些有关DynamoDB的常见食谱 。其中之一是存储文章修订,始终获取最新:)



总而言之,您可以使用<$获得最新的项目。 c $ c> Query(hash_key = ...,ScanIndexForward = True,limit = 1)



但是,这假设您已经定义了range_key_



使用 Scan ,您将没有 ScanIndexForward = false ,无论如何,您不能依赖顺序,因为数据分布在各个分区上,然后 Scan 请求得到了负载平衡。



要实现DynamoDB的目标,可以通过以下方式拆分时间戳:


  1. hash_key :日期

  2. range_key :时间或完整时间戳记

然后,您可以使用 Query + 的技巧限制= 1 + ScanIndexForward = false


I have a little problem that needs some suggestions:

  • Lets say we have a few hundred data tables with a few dozen million rows each.
  • Data tables are timestamp(key) - value
  • Data tables are written once every second

The latest entry of each table should be quickly obtainable and will most likely be queried the most (sorta like "follow data in real time"). With the lack of 'Last()' or similar, I was thinking of creating another table "LatestValues" where the latest entry of each data table is updated for a faster retrieval. This, however, would add an extra update for each write operation. Also, most of the traffic would be concentrated on this table (good/bad?). Is there a better solution for this or am I missing something?

Also, lets say we want to query for the values in data tables. Since scanning is obviously out of the question, is the only option left to create a secondary index by duplicating the data, effectively doubling the storaging requirements and the amount write operations? Any other solutions?

I'm primarily looking at DynamoDB and Azure Table Storage, but I'm also curious how BigTable handles this.

解决方案

I just published an article today with some common "recipes" about DynamoDB. One of them is "Storing article revisions, getting always the latest" I think it might interest you :)

In a nutshell, you can get the latest item using Query(hash_key=..., ScanIndexForward=True, limit=1)

But, this assumes you have a range_key_defined.

With Scan, you have no such parameter as ScanIndexForward=false and anyway, you can not rely on the order as data is spread over partitions and the Scan request is then load balanced.

To achieve you goal with DynamoDB, you may "split" your timestamp this way:

  1. hash_key: date
  2. range_key: time or full timestamp, as you prefer

Then, you can use the 'trick' of Query + Limit=1 + ScanIndexForward=false

这篇关于NoSQL:从表DynamoDB / Azure表存储获取最新值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆