如何在 api 中实现用于分页的游标 [英] How to implement cursors for pagination in an api

查看:31
本文介绍了如何在 api 中实现用于分页的游标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这类似于没有任何答案的这个问题.我已经阅读了关于如何在 twitterfacebookdisqus api 以及 这篇文章 关于 disqus 通常如何构建它们的游标,但我似乎仍然无法理解它们如何工作以及如何在我自己的项目中实现类似解决方案的概念.有人能具体解释一下它们背后的不同技术和概念吗?

This is similar to to this question which doesn't have any answers. I've read all about how to use cursors with the twitter, facebook, and disqus api's and also this article about how disqus generally built their cursors, but I still cannot seem to grok the concept of how they work and how to implement a similar solution in my own projects. Can someone explain specifically the different techniques and concepts behind them?

推荐答案

让我们首先通过一个示例了解为什么 偏移分页 在大型数据集上失败.

Lets first understand why offset pagination fails for large data sets with an example.

客户端为结果数量和偏移以及页面偏移提供两个参数limit.例如,offset = 40,limit = 20,我们可以告诉数据库返回接下来的20个项目,跳过前40个.

Clients provide two parameters limit for number of results and offset and for page offset. For example, with offset = 40, limit = 20, we can tell the database to return the next 20 items, skipping the first 40.

缺点:

  • 使用 LIMIT OFFSET 不能很好地扩展数据集.随着偏移量的增加,您在数据集,数据库仍然必须读取偏移量+计数行从磁盘,在丢弃偏移量并只返回计数之前行.
  • 如果项目以高频率写入数据集,则页面窗口变得不可靠,可能会跳过或返回重复结果.
  • Using LIMIT OFFSET doesn’t scale well for large datasets. As the offset increases the farther you go within the dataset, the database still has to read up to offset + count rows from disk, before discarding the offset and only returning count rows.
  • If items are being written to the dataset at a high frequency, the page window becomes unreliable, potentially skipping or returning duplicate results.

游标如何解决这个问题?

基于光标的分页通过返回指向数据集中特定项目的指针来工作.在后续请求中,服务器在给定指针之后返回结果.

Cursor-based pagination works by returning a pointer to a specific item in the dataset. On subsequent requests, the server returns results after the given pointer.

在这种情况下,我们将使用参数next_cursorlimit作为客户端提供的参数.

We will use parameters next_cursor along with limit as the parameters provided by client in this case.

假设我们要从最近的用户到最早的用户进行分页.当客户端第一次请求时,假设我们通过查询选择了第一页:

Let’s assume we want to paginate from the most recent user to the oldest user.When client request for the first time , suppose we select the first page through query:

SELECT * FROM users
WHERE team_id = %team_id
ORDER BY id DESC
LIMIT %limit

其中limit等于limit加一,比客户端指定的计数多获取一个结果.额外的结果不会在结果集中返回,但我们使用值的 ID 作为 next_cursor.

Where limit is equal to limit plus one, to fetch one more result than the count specified by the client. The extra result isn’t returned in the result set, but we use the ID of the value as the next_cursor.

来自服务器的响应将是:

The response from the server would be:

{
   "users": [...],
   "next_cursor": "1234",  # the user id of the extra result
}

然后客户端将在第二个请求中提供 next_cursor 作为游标.

The client would then provide next_cursor as cursor in the second request.

SELECT * FROM users
WHERE team_id = %team_id
AND id <= %cursor
ORDER BY id DESC
LIMIT %limit

这样,我们解决了基于偏移量的分页的缺点:

With this, we’ve addressed the drawbacks of offset based pagination:

  • 不是根据项目总数从头开始计算每个请求的窗口,我们总是在特定参考点之后获取下一个计数行.如果项目以高频率写入数据集,则集中光标的整体位置可能会发生变化,但分页窗口会相应调整.
  • 这将适用于大型数据集.我们使用 WHERE 子句来获取 id 值小于上一页的最后一个 id 的行.这让我们可以利用列上的索引,数据库不必读取我们已经看到的任何行.
  • Instead of the window being calculated from scratch on each request based on the total number of items, we’re always fetching the next count rows after a specific reference point. If items are being written to the dataset at a high frequency, the overall position of the cursor in the set might change, but the pagination window adjusts accordingly.
  • This will scale well for large datasets. We’re using a WHERE clause to fetch rows with id values less than the last id from the previous page. This lets us leverage the index on the column and the database doesn’t have to read any rows that we’ve already seen.

有关详细说明,您可以访问 slack!

For detailed explanation you can visit this wonderful engineering article from slack!

这篇关于如何在 api 中实现用于分页的游标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆