为什么在分页时可以从Azure搜索获得重复的结果? [英] Why is it possible to get duplicate results from Azure Search when paging?

查看:58
本文介绍了为什么在分页时可以从Azure搜索获得重复的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有时使用 Azure搜索的分页结果中可能有重复的文档.这是一个分页请求的示例:

Sometimes when using Azure Search's paging there may be duplicate documents in the results. Here is an example of a paging request:

GET /indexes/myindex/docs?search=*$top=15&$skip=15&$orderby=rating desc

为什么会这样?怎么会这样分页时有任何一致性保证吗?

Why is this possible? How can it happen? Are there any consistency guarantees when paging?

推荐答案

如果基础索引正在更改,或者您依赖于按相关性得分排序,则不能保证分页查询的结果是稳定的.分页只是更改每个页面的$skip值,但是每个查询都是独立的,并且在数据的当前视图上运行(即–没有快照或其他一致性机制,就像您在通用数据库中找到的那样)

The results of paginated queries are not guaranteed to be stable if the underlying index is changing, or if you are relying on sorting by relevance score. Paging simply changes the value of $skip for each page, but each query is independent and operates on the current view of the data (i.e. – there is no snapshotting or other consistency mechanism like you’d find in a general-purpose database).

这是一个如何获取重复项的示例.假设有四个文档的索引:

Here is an example of how you might get duplicates. Assume an index with four documents:

  1. { "id": "1", "rating": 5 }
  2. { "id": "2", "rating": 3 }
  3. { "id": "3", "rating": 2 }
  4. { "id": "4", "rating": 1 }
  1. { "id": "1", "rating": 5 }
  2. { "id": "2", "rating": 3 }
  3. { "id": "3", "rating": 2 }
  4. { "id": "4", "rating": 1 }

现在,假设您要分页浏览两个结果,按等级排序.您将执行此查询以获取第一页:

Now assume you want to page through the results with a page size of two, ordered by rating. You’d execute this query to get the first page:

$top=2&$skip=0&$orderby=rating desc

并获得以下结果:

  1. { "id": "1", "rating": 5 }
  2. { "id": "2", "rating": 3 }
  1. { "id": "1", "rating": 5 }
  2. { "id": "2", "rating": 3 }

现在您将第五个文档插入索引:

Now you insert a fifth document into the index:

{ "id": "5", "rating": 4 }

此后不久,您执行查询以获取第二页结果:

Shortly thereafter, you execute a query to fetch the second page of results:

$top=2&$skip=2&$orderby=rating desc

并获得以下结果:

  1. { "id": "2", "rating": 3 }
  2. { "id": "3", "rating": 2 }
  1. { "id": "2", "rating": 3 }
  2. { "id": "3", "rating": 2 }

请注意,您两次提取了文档2.这是因为新文档5具有更高的评分值,因此它在文档2之前进行排序并落在第一页上.

Notice that you’ve fetched document 2 twice. This is because the new document 5 has a greater value for rating, so it sorts before document 2 and lands on the first page.

在您依赖文档评分的情况下(您不使用$orderby或使用$orderby=search.score()),分页可以返回重复的结果,因为每个查询可能由不同的副本处理,并且该副本可能具有不同的术语和文档频率统计信息-足以更改页面边界处文档的相对顺序.

In situations where you're relying on document score (either you don't use $orderby or you're using $orderby=search.score()), paging can return duplicate results because each query might be handled by a different replica, and that replica may have different term and document frequency statistics -- enough to change the relative ordering of documents at page boundaries.

由于这些原因,重要的是将Azure搜索视为搜索引擎(因为它是搜索引擎),而不是通用数据库.

For these reasons, it’s important to think of Azure Search as a search engine (because it is), and not a general-purpose database.

这篇关于为什么在分页时可以从Azure搜索获得重复的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆