mySQL - 筛选行的分页 [英] mySQL - Pagination of filtered rows

查看:36
本文介绍了mySQL - 筛选行的分页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 REST 服务,它根据当前页面和每页结果从数据库表中返回行.

I have a REST service which return rows from a database table depending on the current page and results per page.

当不过滤结果时,这很容易做到,我只需执行 SELECT WHERE id >= (page - 1) * perPage + 1 和 LIMIT to perPage.

When not filtering the results, it's pretty easy to do, I just do a SELECT WHERE id >= (page - 1) * perPage + 1 and LIMIT to perPage.

问题是在过滤结果上尝试使用分页时,例如如果我选择只过滤 WHERE type = someType 的行.

The problem is when trying to use pagination on filtered results, e.g. if I choose to filter only the rows WHERE type = someType.

在这种情况下,第一页的第一个匹配可以从 id 7 开始,最后一个可以在 id 5046 中.然后第二页的第一个匹配可以从 7302 开始,到 12430 结束,依此类推.

In that case, the first match of the first page can start in id 7, and the last can be in id 5046. Then the first match of the second page can start at 7302 and end at 12430, and so on.

对于过滤结果的第一页,我可以简单地从 id 1 和 LIMIT 开始到 perPage,但是对于第二页等,我需要知道上一页中最后匹配行的索引,或者甚至更好 - 当前页面中第一个匹配的行,或其他一些指示.

For the first page of filtered results, I'd be able to simply start from id 1 and LIMIT to perPage, but for the second page, etc, I need to know the index of the last matched row in the previous page, or even better - the first matched row in the current page, or some other indication.

我如何有效地做到这一点?我需要能够在具有数百万行的表上执行此操作,因此显然无法获取所有行并从那里获取它.

How do I do it efficiently? I need to be able to do it on tables with millions of rows, so obviously fetching all the rows and taking it from there is not an option.

这个想法是这样的:

SELECT ... FROM ... WHERE filterKey = filterValue AND id >= id_of_first_match_in_current_page

id_of_first_match_in_current_page 是个谜.

with id_of_first_match_in_current_page being the mystery.

推荐答案

您无法知道给定页面上的第一个 id 是什么,因为 id 编号不一定是连续的.换句话说,序列中可能存在间隙,因此 100 行的第五页上的行不一定从 id 500 开始.例如,它可能从 id 527 开始,这是不可能知道的.

You can't know what the first id on a given page is, because id numbers are not necessarily sequential. In other words, there could be gaps in the sequence, so rows on the fifth page of 100 rows doesn't necessarily start at id 500. It could start on id 527 for example, It's impossible to know.

另一种说法:id 是一个值,而不是行号.

如果您的客户按升序浏览页面,一种可能的解决方案是每个 REST 请求获取数据,记下该页面上的最大 id 值,然后在下一个 REST 请求,以便查询更大的 id 值.

One possible solution if your client is advancing through pages in ascending order is that each REST request fetches data, notes the greatest id value on that page, then uses that in the next REST request so it queries id values that are larger.

SELECT ... FROM ... WHERE filterKey = filterValue 
AND id > id_of_last_match_of_previous_page

但如果您的 REST 请求可以获取任何随机页面,则此解决方案不起作用.这取决于是否已经获取了前一页.

But if your REST request can fetch any random page, this solution doesn't work. It depends on having fetched the prior page already.

另一种解决方案是使用 LIMIT <x>OFFSET 语法.这允许您请求任意页面.LIMIT <y>, <x> 工作原理相同,但由于某种原因,x 和 y 在两种不同的语法形式中颠倒了,所以请记住这一点.

Another solution is to use the LIMIT <x> OFFSET <y> syntax. This allows you to request any arbitrary page. LIMIT <y>, <x> works the same, but for some reason x and y are reversed in the two different syntax forms, so keep that in mind.

使用 LIMIT...OFFSET 在您请求的页面中有很多页面时效率不高.假设您请求第 5,000 页.MySQL 必须在服务器端生成 5,000 页的结果,然后丢弃其中的 4,999 页并返回结果中的最后一页.抱歉,这就是它的工作原理.

Using LIMIT...OFFSET isn't very efficient when you request a page that is many pages into the result. Say you request the 5,000th page. MySQL has to generate a result on the server-side of 5,000 pages, then discard 4,999 of them and return the last page in the result. Sorry, but that's how it works.

重新评论:

您必须了解WHERE 对行中的 应用条件,但页面由行的位置 定义.这是确定行的两种不同方式!

You must understand that WHERE applies conditions on values in rows, but pages are defined by the position of rows. These are two different ways of determining rows!

如果您有一列保证是行号,那么您可以像使用行位置一样使用该值.您甚至可以在其上放置索引,或将其用作主键.

If you have a column that is guaranteed to be a row-number, then you can use that value like a row position. You can even put an index on it, or use it as the primary key.

但是主键值可能会发生变化,并且可能不是连续的,例如如果您更新或删除行,或者回滚某些事务等.对主键值重新编号是一个坏主意,因为其他表或外部数据可能会引用主键值.

But primary key values may change, and may not be consecutive, for example if you update or delete rows, or rollback some transactions, and so on. Renumbering primary key values is a bad idea because other tables or external data may reference primary key values.

因此您可以添加另一列不是主键,而只是一个行号.

So you could add another column that is not the primary key, but only a row-number.

ALTER TABLE MyTable ADD COLUMN row_number BIGINT UNSIGNED, ADD KEY (row_number);

然后在需要重新编号行时填写值.

Then fill the values when you need to renumber the rows.

SET @row := 0;
UPDATE MyTable SET row_number = (@row := @row + 1) ORDER BY id;

例如,如果您删除了某些行,则必须重新编号.频繁执行此操作效率不高,具体取决于表的大小.

You'd have to re-number the rows if you ever delete some, for example. It's not efficient to do this frequently, depending on the size of the table.

此外,在不锁定表的情况下,新插入无法创建正确的行号值.这是防止竞争条件所必需的.

Also, new inserts cannot create correct row number values without locking the table. This is necessary to prevent race conditions.

如果你保证 row_number 是一个连续值的序列,那么它既是一个值又是一个行位置,所以你可以用它来对任意页面进行高性能索引查找行.

If you have a guarantee that row_number is a sequence of consecutive values, then it's both a value and a row position, so you can use it for high-performance index lookups for any arbitrary page of rows.

SELECT * FROM MyTable WHERE row_number BETWEEN 401 AND 500;

至少在下一次行号序列因删除或新插入而受到怀疑之前.

At least until the next time the sequence of row numbers is put into doubt by a delete or by new inserts.

这篇关于mySQL - 筛选行的分页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆