如何加快Cosmos DB聚合查询的速度? [英] How to speed up a Cosmos DB aggregate query?

查看:166
本文介绍了如何加快Cosmos DB聚合查询的速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的cosmos db聚合查询似乎很慢,并且花费了很多RU.以下是详细信息(另请参见下面的屏幕截图):2.4s和3222RUs用于计数414k记录的结果集.而且这只是一个计数.通常,我们希望一次对多个字段求和(只能在单个分区内),但是这样做的性能要差得多.

Our cosmos db aggregate query seems slow and costs a lot of RUs. Here are the details (plus see screenshot below): 2.4s and 3222RUs to count a result set of 414k records. Also this for just one count. Normally we would want to do a sum on many fields at once (possible only within a single partition), but performance for that is much worse.

此收藏集中有200万条记录.我们正在使用带有SQL API的Cosmos DB.该特定集合按country_code划分,在法国("FR")中有414,732条记录,在美国有其余记录.文档大小平均为917字节,也许最小为800字节,最大为1300字节.

There are 2 million records in this collection. We are using Cosmos DB w/SQL API. This particular collection is partitioned by country_code and there are 414,732 records in France ("FR") and the remainder in US. Document size is averages 917 bytes and maybe min is 800 bytes, max 1300 bytes.

请注意,我们还尝试了更稀疏的分区键,例如device_id(此处有200万个,每个设备1个文档),此查询的结果较差. c.calcuated.flag1字段仅表示我们要保留的状态"(实际上我想总结一下8个状态).

Note that we have also tried a much sparser partitioning key like device_id (of which there are 2 million, 1 doc per device here) which has worse results for this query. The c.calcuated.flag1 field just represents a "state" that we want to keep a count of (we actually have 8 states that I'd like to summarize on).

此集合的索引是默认的,它使用一致"索引模式,并索引所有字段(并包括Number和String的范围索引). RU设置为20,000,并且数据库上没有其他活动.

The indexing on this collection is the default, which uses "consistent" index mode, and indexes all fields (and includes range indexes for Number and String). RU setting is at 20,000, and there is no other activity on the DB.

所以,让我知道您对此的想法.可以在不增加RU费用和花费很长时间的情况下,合理地使用Cosmos DB来获取字段的总数或计数吗?虽然2.4s并不糟糕,但我们确实需要亚秒级的查询来进行此类操作.我们的应用程序(基于IoT)通常需要单个文档,但有时也需要在一个国家/地区的所有文档中进行此类计数.

So let me know your thoughts on this. Can Cosmos DB be used reasonably to get a few sums or counts on fields without ramping up our RU charges and taking a long time? While 2.4s is not awful, we really need sub-second queries for this kind of thing. Our application (IoT based), often needs individual documents, but also sometimes needs these kinds of counts across all documents in a country.

是否有提高性能的方法?

Is there a way to improve performance?

推荐答案

Cosmos数据库团队现在对聚合性能以及如何使用索引进行了一些重大更改.这是他们的索引"v2"策略,并且仅在最近才推出(它可能不适用于所有帐户,如果您有需要升级的旧数据库,请与MSFT联系.)

The Cosmos DB team has now made some significant changes to aggregation performance and how indexes are used. This is their indexing "v2" strategy and was only recently rolled out (it may not be available to all accounts yet, contact MSFT if you have an older db that needs upgrading).

您可以将新结果与我最初发布的图片进行比较.

You can compare the new results to the picture I originally posted.

您现在将注意到,文档加载时间显示为0毫秒,检索到的文档大小为0字节.我现在可以确认的加载时间确实非常快,因此从服务器端进行测量时,加载时间可能不到1毫秒.而且文件大小为0更有意义,因为不需要为此检索任何文件(仅基于索引进行计数).

You'll note now that Document load time shows as 0ms and the retrieved document size is 0 bytes. The load time I can confirm is really quite fast now so it is possible it is under 1ms when measured from the server side. And document size of 0 makes more sense since no documents need to be retrieved for this (only count based on the index).

最后,您可以看到RU从3222下降到7.4 !!!!很大的不同.

Finally you can see that the RUs dropped from 3222 to 7.4 !!!! A pretty drastic difference.

在一个分区中一次对多列进行求和现在也很有效,我们可以在200万个文档中一次完成约8次求和,具有〜50 RU,并且从函数API端点进行测量大约需要20-70毫秒(因此包括网络时间.

Summing on multiple columns at once within a single partition is also quite performant now and we can do about 8 sums at once across 2 million documents with ~50 RUs and it takes about 20-70ms when measured from a function API endpoint (so includes network time).

Cosmos DB团队仍需要做更多的工作来允许跨分区的多列聚合,但是我们现在的改进是很有希望的.

More work still needs to be done by Cosmos DB team to allow for cross partition multi-column aggregations, but the improvements we have now are quite promising.

这篇关于如何加快Cosmos DB聚合查询的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆