在 postgresql 中对一个非常大的表进行分页和过滤(键集分页?) [英] pagination and filtering on a very large table in postgresql (keyset pagination?)

查看:82
本文介绍了在 postgresql 中对一个非常大的表进行分页和过滤(键集分页?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个科学数据库,目前有 4,300,000 条记录.它是一个科学数据库,由 API 提供支持.到 2020 年 6 月,我可能会有大约 100,000,000 条记录.

I have a scientific database with currently 4,300,000 records. It's a scientific database, and an API is feeding it. In june 2020, I will probably have about 100,000,000 records.

这是表格'输出'的布局:

This is de layout of the table 'output':

ID   | sensor_ID    | speed    | velocity | direction
-----------------------------------------------------
1    | 1            | 10       | 1        | up
2    | 2            | 12       | 2        | up
3    | 2            | 11.5     | 1.5      | down
4    | 1            | 9.5      | 0.8      | down
5    | 3            | 11       | 0.75     | up
...

顺便说一句,这是虚拟数据.但是输出是一个包含 5 列的表:ID、sensor_ID、速度、速度和方向.

BTW, this is dummy data. But output is a table with 5 columns: ID, sensor_ID, speed, velocity and direction.

我想要实现的是一个体面的分页和过滤方法.我想创建一个网站(在 nodejs 中),其中将显示 +4,000,000 条记录(目前),每页 10,000 条记录.我还希望能够过滤 sensor_ID、速度、速度或方向.

What I want to achieve is a decent pagination and filter method. I want to create a website (in nodejs) where this +4,000,000 records (for now) will be displayed, 10,000 records per page. I also want to be able to filter on sensor_ID, speed, velocity or direction.

现在,我有这个用于选择特定行的查询:

For now, I have this query for selecting specific rows:

SELECT * FROM output ORDER BY ID DESC OFFSET 0 LIMIT 10000 // first 10,000 rows

SELECT * FROM output ORDER BY ID DESC OFFSET 10000 LIMIT 10000 // next 10,000 rows

...

我正在寻找有关创建合适的分页方法的一些信息/提示.就目前而言,按照我的方式,它仍然很快,但我认为当我们达到 +50,000,000 条记录时,速度会慢很多.

I'm searching for some information/tips about creating a decent pagination method. For now, it's still quiet fast the way I do it, but I think it will be a lot slower when we hit +50,000,000 records.

首先,我找到了这个页面:https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/.我对 keyset 分页很感兴趣.但老实说,我不知道如何开始.

First of all, I found this page: https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/. I'm interested in the keyset pagination. But to be honest, I have no clue how to start.

认为我必须做的:

在 ID 字段上创建索引:

Create an index on the ID-field:

CREATE UNIQUE INDEX index_id ON output USING btree (ID)

我也找到了这个页面:https://leopard.in.ua/2014/10/11/postgresql-分页.当您向下滚动到改进 #2:寻求方法"时,您可以看到他们删除了 OFFSET 子句,并使用了 WHERE 子句.我还看到他们在查询中使用了最后一个插入 ID:

I also found this page: https://leopard.in.ua/2014/10/11/postgresql-paginattion. When you scroll down to "Improvement #2: The Seek Method", you can see that they dropped the OFFSET-clause, and are using a WHERE-clause. I also see that they are using the last insert ID in their query:

SELECT * FROM output WHERE ID < <last_insert_id_here> ORDER BY ID DESC LIMIT 10000

我不完全理解这一点.对于第一页,我需要最后一个插入 ID.然后我获取 10,000 条最新记录.但是在那之后,要获得第二页,我不需要最后一个插入 ID,我需要第 10,000 个最后一个插入 ID(我猜).

I do not fully understand this. For the first page, I need the very last insert ID. Then I fetch the 10,000 newest records. But after that, to get the second page, I don't need the very last insert ID, I need the 10,000th last insert ID (I guess).

谁能给我一个快速的分页和过滤的很好的解释.

Can someone give me a good explanation about pagination and filtering in a fast way.

我正在使用的东西:- PostgreSQL- pgadmin(用于数据库管理)- node.js(最新版本)

The stuff I'm using: - postgresql - pgadmin (for database management) - node.js (latest version)

谢谢大家!祝 2020 年愉快!

Thanks everyone! And have a nice 2020!

编辑 1: 我不知道,但可以 MassiveJS (https://massivejs.org/) 好用吗?我应该在所有查询中使用它,还是只在分页查询中使用它?

EDIT 1: I have no clue, but could massiveJS (https://massivejs.org/) be something good to use? And should I use it on ALL queries, or only on the pagination queries?

编辑 2: 我想我已经弄明白了一点(如果我错了,请纠正我).

EDIT 2: I THINK I got it figured out a little bit (correct me if I'm wrong).

假设我有 100,000 条记录:

Let's say I have 100,000 records:

1) 获取最后插入的ID

1) Get the last inserted ID

2) 使用最后插入的 ID 获取最后 10,000 条记录

2) Use this last inserted ID to fetch the last 10,000 records

SELECT * FROM output WHERE ID < 100000 ORDER BY ID DESC LIMIT 10000 // last insert ID is here 100,000 because I have 100,000 records

3) 显示 10,000 条记录,同时保存 10,000 条记录的插入 ID 以供下次查询使用

3) Show the 10,000 records but also save the insert ID of the 10,000the record to use in the next query

4) 使用新的最后插入 id 获取接下来的 10,000 条记录

4) Get the next 10,000 records with the new last insert id

SELECT * FROM output WHERE ID < 90000 ORDER BY ID DESC LIMIT 10000 // 90,000 is the very last insert id - 10,000

5) ...

这是正确的吗?

推荐答案

这是我的处理方法.对于我获取的第一页,我使用

Here's how I handle this. For the first page I fetch, I use

SELECT id, col, col, col 
  FROM output 
 ORDER BY id DESC
 LIMIT 10000

然后,在我的客户端程序 (node.js) 中,我从结果集的最后一行捕获 id 值.当我需要下一页时,我会这样做.

Then, in my client program (node.js) I capture the id value from the last row of the result set. When I need the next page, I do this.

 SELECT id, col, col, col
   FROM output
  WHERE id < my_captured_id_value
  ORDER BY id DESC

这会利用索引.即使您从表中删除了一些行,它也能正常工作.

This exploits the index. And it works correctly even if you have deleted some rows from the table.

顺便说一下,如果您的第一个分页页面具有最大的 id,您可能需要一个降序索引.CREATE UNIQUE INDEX index_id ON output USING btree (ID DESC).

By the way, you probably want a descending index if your first pagination page has the largest ids. CREATE UNIQUE INDEX index_id ON output USING btree (ID DESC).

专业提示 SELECT * 对大型数据库的性能有害.始终列出您实际需要的列.

Pro tip SELECT * is harmful to performance on large databases. Always list the columns you actually need.

这篇关于在 postgresql 中对一个非常大的表进行分页和过滤(键集分页?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆