Google数据存储 - 为没有热点的日期创建字段索引 [英] Google datastore - index a date created field without having a hotspot

查看:178
本文介绍了Google数据存储 - 为没有热点的日期创建字段索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Google数据存储,并需要查询它才能检索一些实体。这些实体需要按照从最新到最旧排序。我的第一个想法是有一个date_created属性,其中包含一个时间戳。然后,我会索引这个字段并对这个字段进行排序。这种方法的问题是,它会导致数据库中的热点( https://开头云。不要使用单调递增的值来索引属性(例如:google.com/datastore/docs/best-practices )。


NOW()时间戳)。维护这样的索引可能会导致影响云数据存储延迟的热点,因为读写速度较高的应用程序。


明显按日期排序数据恰当地是在数据库上执行的最常见的排序。如果我不能索引时间戳,有另一种方式我可以做到能够从最新到最旧而不热点?


解决方案
我queires排序正如你所注意到的,索引单调变化的值不会缩放,并可能导致热点。您是否受到这种影响取决于您的具体用途。



作为一般规则,此模式的热点是每秒500次写入。如果你知道你肯定会留下,你可能不需要担心。



如果你确实需要每秒超过500次写入,但有一个考虑到上限,你可以尝试一个分片方法。基本上,如果每秒写入次数为x,那么n = ceiling(x / 500),其中n是碎片的数量。当你写你的时间戳时,在开始时加上random(1,n)。这会创建n个随机密钥范围,每个范围每秒最多可执行500次写入。当你查询你的数据时,你需要发出n个查询,并做一些客户端合并结果流。


I am using Google Datastore and will need to query it to retrieve some entities. These entities will need to be sorted by newest to oldest. My first thought was to have a date_created property which contains a timestamp. I would then index this field and sort on this field. The problem with this approach is it will cause hotspots in the database (https://cloud.google.com/datastore/docs/best-practices).

Do not index properties with monotonically increasing values (such as a NOW() timestamp). Maintaining such an index could lead to hotspots that impact Cloud Datastore latency for applications with high read and write rates.

Obviously sorting data on dates is properly the most common sorting performed on a database. If I can't index timestamps, is there another way I can accomplish being able to sort my queires from newest to oldest without hotspots?

解决方案

As you note, indexing monotonically changed values doesn't scale and can lead to hotspots. Whether you are potentially impacted by this depends on your particular usage.

As a general rule, the hotspotting point of this pattern is 500 writes per second. If you know you're definitely going to stay under that you probably don't need to worry.

If you do need higher than 500 writes per second, but have a upper limit in mind, you could attempt a sharded approach. Basically, if you upper on writes per second is x, then n = ceiling(x/500), where n is the number of shards. When you write your timestamp, prepend random(1, n) at the start. This creates n random key ranges that each can perform up to 500 writes per second. When you query your data, you'll need to issue n queries and do some client side merging of the result streams.

这篇关于Google数据存储 - 为没有热点的日期创建字段索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆