查询cosmos db中的大集合 [英] Querying large collections in cosmos db
问题描述
我们当前在文档数据库中有一个非常大的集合. 我们希望能够基于集合中文档中的某些字段来过滤集合.
We currently have a very large collection in our document DB. We want to be able to filter the collection based on some fields in the documents in the collection.
当我通过门户网站执行此查询时,它会花费很长时间,因为其中包含大量数据. 当我通过功能应用执行此查询时,由于超时,它在五分钟后消失了.
When I perform this query via the portal it takes a really long time because there is so much data. When I perform this query via a function app, it cuts out after five minutes due to a time-out.
执行此搜索的最佳方法是什么? 是否可以通过Application Insights或某种方式执行此搜索? 我知道查询本身可能会花费很长时间,但它不应阻塞.通过门户网站查询会阻止所有其他操作.
What is the best way to perform this search? Is it possible to perform this search via Application Insights or some sort? I am aware that the query itself can take a long time but it shouldn't be blocking. Querying via the portal blocks all other actions.
先谢谢了. 问候
推荐答案
首先,您需要知道的是Document DB对Response page size
施加了限制.此链接总结了其中一些限制: Azure DocumentDb存储限制-什么到底是什么意思?
Firstly, what you need to know is that Document DB imposes limits on Response page size
. This link summarizes some of those limits: Azure DocumentDb Storage Limits - what exactly do they mean?
第二,如果要从Document DB查询大数据,则必须考虑查询性能问题,请参考本文:
Secondly, if you want to query large data from Document DB, you have to consider the query performance issue, please refer to this article:Tuning query performance with Azure Cosmos DB.
通过查看文档DB REST API ,您会发现几个对查询操作有重大影响的重要参数:x-ms-max-item-count, x-ms-continuation.
By looking at the Document DB REST API, you can observe several important parameters which has a significant impact on query operations : x-ms-max-item-count, x-ms-continuation.
Azure门户网站不会自动帮助您优化SQL,因此您需要在sdk或rest api中进行处理.
Azure portal doesn't automatically help you optimize your SQL so you need to handle this in the sdk or rest api.
您可以设置最大项目数并分页continuation tokens
读取数据. Document Db sdk支持无缝读取分页数据.您可以参考以下python代码片段:
You could set value of Max Item Count and paginate your data using continuation tokens
. The Document Db sdk supports reading paginated data seamlessly. You could refer to the snippet of python code as below:
q = client.QueryDocuments(collection_link, query, {'maxItemCount':10})
results_1 = q._fetch_function({'maxItemCount':10})
#this is a string representing a JSON object
token = results_1[1]['x-ms-continuation']
results_2 = q._fetch_function({'maxItemCount':10,'continuation':token})
希望它对您有帮助.
这篇关于查询cosmos db中的大集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!