Cosmos DB连续令牌大小会影响查询是否返回新文档 [英] Cosmos DB continuation token size influences whether query returns new documents

查看:99
本文介绍了Cosmos DB连续令牌大小会影响查询是否返回新文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正通过Azure.Cosmos DB(通过.NET SDK)搞乱了,发现有些奇怪.

I was messing around with the Azure Cosmos DB (via .NET SDK) and noticed something odd.

通常,当我使用连续令牌逐页请求查询时,我从不会获得在创建第一个连续令牌之后创建的文档.我可以观察到已更改的文档,缺少已删除(或新过滤出的文档)的文档,但没有新文档. 但是,如果我只允许使用1kB连续令牌(我可以设置的最小令牌),那么我也将获得新文档.很显然,只要它们最终排序到其余页面即可.

Normally when I request a query page by page using continuation tokens, I never get documents that were created after the first continuation token had been created. I can observe changed documents, lack of removed (or rather newly filtered out) documents, but not the new ones. However, if I only allow 1kB continuation tokens (the smallest I can set), I get the new documents as well. As long as they end up sorted to the remaining pages, obviously.

这是有道理的,因为有大小限制,所以我阻止Cosmos DB在连续令牌中包括序列化的索引查找和其他内容.缺点是,Cosmos DB必须为我请求的每个页面重新创建恢复状态,这将花费一些额外的RU.至少根据此讨论.副作用是,新文档最终出现在结果中.

This kind of makes sense, since with the size limit, I prevent the Cosmos DB from including the serialized index lookup and whatnot in the continuation token. As a downside, the Cosmos DB has to recreate the resume state for every page I request, what will cost some extra RUs. At least according to this discussion. As a side-effect, new documents end up in the result.

现在,我实际上对此有两个疑问.

Now, I actually have a couple of questions in regards to this.

  1. 这种行为可靠吗?我希望看到一些有关此文件的文档.
  2. 较大的续用令牌节省的RU数量是否显着?
  3. 还有另一种方法来获取包含在结果中的新文档吗?
  4. 我的假设完全错误吗?

推荐答案

我来自CosmosDB工程团队.

I am from the CosmosDB Engineering Team.

  1. 这种行为可靠吗?我希望看到一些有关此文件的文档.

由于客户的要求,我们引入了此功能(限制了连续令牌的大小),以帮助减少响应的连续大小.我们认为,太多细节无法揭示修剪连续性的影响,因为对于大多数客户而言,微妙的行为更改无关紧要.

We brought in this feature (limiting continuation token size) due to an ask from customers to help in reducing the response continuation size. We are of the opinion that it's too much detail to expose the effects of pruning the continuation, since for most customers the subtle behavior change shouldn't matter.

  1. 较大的续用令牌节省的RU数量是否显着?

这取决于从索引生成状态所完成的工作量.例如,如果我们必须评估范围谓词(例如_ts>一些离散秒),则保存的RU可能很重要,因为我们有可能避免扫描与_ts对应的整堆索引键(这可以是O(文档),假设最坏的情况是每秒最多插入1个文档).在这种情况下,假设有X个连续,我们可以节省(X-1)* O(文档数)个工作量.

This depends on the amount of work done in producing the state from the index. For example, if we had to evaluate a range predicate (e.g. _ts > some discrete second), then the RU saved could be significant, since we potentially avoid scanning a whole bunch of index keys corresponding to _ts (this could be O(number of documents), assuming the worst case of having inserted at most 1 document per second). In this scenario, assuming X continuations, we save (X - 1) * O(number of documents) worth of work.

  1. 还有另一种方法来获取包含在结果中的新文档吗?

否,除非您通过将标头设置为1强制CosmosDB对每个连续进行重新评估索引,否则通常不希望对连续执行相当快的查询,因此用户看到新文档的机会应该很小.理想情况下,我们应该实现快照隔离以从第一个延续中获取带有会话令牌的结果,但是我们还没有做到这一点.

No, not unless you force CosmosDB to re-evaluate the index every continuation by setting the header to 1. Typically queries are meant to be executed fairly quickly over continuations, so the chance of users seeing new documents should be fairly small. Ideally we should implement snapshot isolation to retrieve results with the session token from the first continuation, but we haven't done this yet.

  1. 我的假设完全错误吗?

您的假设在:)

这篇关于Cosmos DB连续令牌大小会影响查询是否返回新文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆