通过ID进行Azure DocumentDB查询的速度非常慢 [英] Azure DocumentDB Query by Id is very slow

查看:57
本文介绍了通过ID进行Azure DocumentDB查询的速度非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有16个GB的存储空间,其中包含2个分区.当我通过ID来查询文档时,它的运行速度非常慢.但是通过索引字段进行查询的速度很快.两者都是跨分区查询,如果我在查询中传递分区键,它很快,但是分区键并不总是可用于我的查询.在Azure Portal中使用.NET SDK和Document Explorer查询获得了类似的结果.

I got a 16GB collection with 2 partitions. When I query a document by it's Id, it is very slow. But querying by an indexed field is fast. Both are cross-partition query and if I pass partition key with the query it is fast but partition key is not always available for my query. Got similar results using .NET SDK and Document Explorer Query in Azure Portal.

该集合具有自定义索引策略,但据我所知,您不需要为Id编制索引,否则可能甚至无法建立索引.

The collection has custom indexing policy but as far as I know you don't need to index Id or it may not be even possible.

这是我的查询及其相应的请求费用.

Here are my queries and their corresponding request charges.

SELECT * FROM c where c.id = 'unique-id-123'
-- Request Charge: 344940.79 RUs, Document Count: 1

SELECT * FROM c WHERE c.otherId = 'NOT-so-uniqueId-123'
-- Request Charge: 5.08 RUs, Document Count: 3

如您所知,Id是唯一的,因此查询返回1个文档,而第二个查询由otherId过滤,而otherId不是唯一的,并返回3个文档.还要注意,第一个查询的RU消耗非常高.

As you know, Id is unique so the query returns 1 document while the second query is filtered by otherId which is not so unique and returns 3 documents. Also notice the insanely high RUs consumption with the first query.

那么为什么第二次查询比通过ID更快?

So why the 2nd query is faster than by Id?


更新:
这是上述查询的收集指标.


Update:
Here are the gathered metrics for the above queries.

按ID查询:

Read 1 records in 1497 ms, 339173.109 RU, Size: 6873022 KB
QueryPreparationTime(ms): CompileTime = 2, LogicalBuildTime = 0,
     PhysicalPlanBuildTime = 0, OptimizationTime = 0
QueryEngineTime(ms): DocumentLoadTime = 1126, IndexLookupTime = 0,
     RuntimeExecutionTimes = 356, WriteOutputTime = 0

按索引字段查询

Read 4 records in 2 ms, 7.56 RU, Size: 9 KB
QueryPreparationTime(ms): CompileTime = 0, LogicalBuildTime = 0, 
     PhysicalPlanBuildTime = 0, OptimizationTime = 0
QueryEngineTime(ms): DocumentLoadTime = 0, IndexLookupTime = 1, 
     RuntimeExecutionTimes = 0, WriteOutputTime = 0

这些证明了ID的查询正在执行表扫描,因为花费的大部分时间都来自DocumentLoadTime,而IndexLookupTime没有任何值.
但是我认为ID应该是主键,并且根据@ andrew-liu的 answer .

These proves that query by Id is doing table scan as most of the time spent was from DocumentLoadTime and no value for IndexLookupTime.
But I thought Id should be the primary key and indexed by default as per this answer by @andrew-liu.

推荐答案

Microsoft支持已得到响应,他们已经解决了该问题.他们为集合添加了IndexVersion 2.不幸的是,它尚不能从门户网站获得,并且新创建的帐户/集合仍未使用新版本.您必须与Microsoft支持部门联系以对您的帐户进行更改.

Microsoft support responded and they've resolved the issue. They've added IndexVersion 2 for the collection. Unfortunately, it is not yet available from the portal and newly created accounts/collection are still not using the new version. You'll have to contact Microsoft Support to made changes to your accounts.

这是索引版本2的集合的新结果,并且有了很大的改进.

Here are the new results from a collection with index version 2 and there's a massive improvement.

SELECT * FROM c where c.id = 'uniqueValue'
-- Index Version 1: Request Charge: 344,940.79 RUs
-- Index Version 2: Request Charge: 3.31 RUs

SELECT * FROM c WHERE c.indexedField = 'value' AND c.id = 'uniqueValue'
-- Index Version 1: Request Charge: 150,666.22 RUs 
-- Index Version 2: Request Charge: 5.65 RUs

这篇关于通过ID进行Azure DocumentDB查询的速度非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆