为什么在WHERE子句中包含Cosmos SQL API查询的分区键会增加某些查询的已用RU? [英] Why does including partition key in WHERE clause to Cosmos SQL API query increase consumed RUs for some queries?

查看:61
本文介绍了为什么在WHERE子句中包含Cosmos SQL API查询的分区键会增加某些查询的已用RU?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想针对消耗的RU优化我的Azure Cosmos DB SQL API查询(部分是为了减少429个响应的频率).

I would like to optimise my Azure Cosmos DB SQL API queries for consumed RUs (in part in order to reduce the frequency of 429 responses).

我特别认为在WHERE子句中包含分区键会减少消耗的RU(例如,我阅读

Specifically I thought that including the partition key in WHERE clauses would decrease consumed RUs (e.g. I read https://docs.microsoft.com/en-us/azure/cosmos-db/optimize-cost-queries and https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview which made me think this).

但是,当我跑步时

SELECT TOP 1 * 
FROM c
WHERE c.Field = "some value"
AND c.PartitionKeyField = "1234"
ORDER BY c.TimeStampField DESC

它消耗6 RU.

例如没有分区键的

SELECT TOP 1 * 
FROM c
WHERE c.Field = "some value"
ORDER BY c.TimeStampField DESC

它消耗了5.76卢布-即更便宜.

It consumes 5.76 RUs - i.e. cheaper.

(以上数字根据所选择的确切文档而有所不同,第二个查询总是更便宜,并且我已经针对最小和最大分区进行了测试.)

(whilst there is some variation in the above numbers depending on the exact document selected, the second query is always cheaper, and I have tested against both the smallest and largest partitions.)

我的数据库目前有大约40万个文档和29个分区(预计都将增长).最大的分区包含大约150,000个文档(可能不会进一步扩展).

My database currently has around 400,000 documents and 29 partitions (both are expected to grow). Largest partition has around 150,000 documents (unlikely to grow further than this).

以上结果向我表明,我不应该在WHERE子句中为该查询传递分区键.请问有人可以解释为什么会这样吗?从文档中我认为相反的说法应该是正确的?

The above results indicate to me that I should not pass the partition key in the WHERE clause for this query. Please could someone explain why this is so as from the documentation I thought the opposite should be true?

推荐答案

可能有一些原因,它取决于查询引擎决定使用哪个索引,或者根本没有索引.

There might a few reasons and it depends on which index the query engine decides to use or if there is an index at all.

我首先要说的是,此容器中可能没有太多数据,因为没有分区键的查询会随着容器的增大而变得越来越昂贵,尤其是当它们跨越物理分区时.

First thing I can say is that there is likely not much data in this container because queries without a partition key get progressively more expensive the larger the container, especially when they span physical partitions.

如果分区键上没有索引,并且在经过c.field过滤后对其进行了扫描,则第一个索引可能会更昂贵.

The first one could be more expensive if there is no index on the partition key and did a scan on it after filtering by the c.field.

根据是否有复合索引以及是否使用复合索引,它的价格也可能更高.

It could also be more expensive depending on whether there is a composite index and whether it used it.

尽管您不能为小型容器获取查询指标并进行推断,但确实如此.唯一的测量方法是将足够的数据放入容器中.同样,这里的数量是如此之小,以至于不值得优化.我会将大量数据放入您希望在生产中使用过的容器中,然后重新运行查询.

Really though you cannot take query metrics for small containers and extrapolate. The only way to measure is to put enough data into the container. Also the amount here is so small that it's not worth optimizing over. I would put the amount of data into this container you expect to have once in production and re-run your queries.

最后,关于度量和优化,应用了pareto原理.您会发疯追逐每个优化.找到您的高并发查询,并专注于这些.

Lastly, with regards to measuring and optimizing, pareto principle applies. You'll go nuts chasing down every optimization. Find your high concurrency queries and focus on those.

希望这会有所帮助.

这篇关于为什么在WHERE子句中包含Cosmos SQL API查询的分区键会增加某些查询的已用RU?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆