使用“光标"用于 PostgreSQL 中的分页 [英] Using "Cursors" for paging in PostgreSQL

查看:17
本文介绍了使用“光标"用于 PostgreSQL 中的分页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<块引用>

可能的重复:
如何为 API 客户端提供 1,000,000数据库结果?

想知道游标的使用是个好方法使用 PostgreSQL 实现分页".

用例是我们有超过 100,000 行可供我们的 API 客户端使用.我们认为实现这一目标的一个好方法是允许客户端分批(页面)请求信息.客户端一次可以请求 100 行.我们将返回 100 行和一个游标,然后当客户端准备好时,他们可以使用我们发送给他们的游标请求接下来的 100 行.

但是,我对游标的工作方式以及应该如何以及何时使用游标有点模糊:

  • 游标是否要求数据库连接保持打开状态?
  • 游标是否在事务内运行,锁定资源直到它们关闭"?
  • 还有其他我不知道的陷阱"吗?
  • 还有其他更好的方法来处理这种情况吗?

非常感谢!

解决方案

游标是在处理大型数据集的较小 Intranet 应用程序中进行分页的合理选择,但您需要准备好在超时后丢弃它们.用户喜欢闲逛、去吃午饭、去度假两周等,然后让他们的应用程序运行.如果它是一个基于网络的应用程序,甚至还有一个问题是什么是运行"以及如何判断用户是否还在.

它们不适合具有大量客户端的大型应用程序以及像基于 Web 的应用程序或 Web API 那样来来往往的客户端.我不建议在你的应用程序中使用游标,除非你有一个相当小的客户端数量和非常高的请求率......在这种情况下,发送小批量的行将非常低效,你应该考虑允许范围请求等.

游标有几个成本.如果游标不是WITH HOLD,您必须保持事务打开.打开的事务会阻止 autovacuum 正常工作,导致表膨胀和其他问题.如果游标声明为 WITH HOLD 并且事务没有保持打开状态,则您必须支付实现和存储潜在大结果集的成本 - 至少,我认为这就是保持游标的工作方式.替代方案同样糟糕,保持事务隐式打开,直到游标被销毁并防止行被清理.

此外,如果您使用游标,则无法将连接交还给连接池.每个客户端需要一个连接.这意味着更多的后端资源仅用于维护会话状态,并为您可以使用基于游标的方法处理的客户端数量设置了一个非常实际的上限.

与具有限制和偏移的无状态连接池方法相比,管理有状态的、基于游标的设置也存在复杂性和开销.您需要让您的应用程序在超时后使游标过期,或者您面临服务器上潜在的无限资源使用,并且您需要跟踪哪些连接具有哪些游标,哪些结果集为哪些用户....

总的来说,尽管它可能非常低效,LIMITOFFSET 可能是更好的解决方案.尽管,搜索主键通常比使用OFFSET更好.>

顺便说一下,您正在查看有关 PL/pgSQL 中游标的文档.您需要普通 SQL 级游标来完成这项工作.><小时><块引用>

游标是否要求数据库连接保持打开状态?

是的.

<块引用>

游标是否在事务内运行,锁定资源直到它们被关闭"了吗?

是的,除非它们是 WITH HOLD,在这种情况下它们会消耗其他数据库资源.

<块引用>

还有其他我不知道的陷阱"吗?

是的,正如上面应该解释的那样.

Possible Duplicate:
How to provide an API client with 1,000,000 database results?

Wondering of the use of Cursors is a good way to implement "paging" using PostgreSQL.

The use case is that we have upwards 100,000 rows that we'd like to make available to our API clients. We thought a good way to make this happen would be to allow the client to request the information in batches ( pages ). The client could request 100 rows at a time. We would return the 100 rows as well as a cursor, then when the client was ready, they could request the next 100 rows using the cursor that we sent to them.

However, I'm a little hazy on how cursors work and exactly how and when cursors should be used:

  • Do the cursors require that a database connection be left open?
  • Do the cursors run inside a transaction, locking resources until they are "closed"?
  • Are there any other "gotchas" that I'm not aware of?
  • Is there another, better way that this situation should be handled?

Thanks so much!

解决方案

Cursors are a reasonable choice for paging in smaller intranet applications that work with large data sets, but you need to be prepared to discard them after a timeout. Users like to wander off, go to lunch, go on holiday for two weeks, etc, and leave their applications running. If it's a web-based app there's even the question of what "running" is and how to tell if the user is still around.

They are not suitable for large-scale applications with high client counts and clients that come and go near-randomly like in web-based apps or web APIs. I would not recommend using cursors in your application unless you have a fairly small client count and very high request rates ... in which case sending tiny batches of rows will be very inefficient and you should think about allowing range-requests etc instead.

Cursors have several costs. If the cursor is not WITH HOLD you must keep a transaction open. The open transaction can prevent autovacuum from doing its work properly, causing table bloat and other issues. If the cursor is declared WITH HOLD and the transaction isn't held open you have to pay the cost of materializing and storing a potentially large result set - at least, I think that's how hold cursors work. The alternative is just as bad, keeping the transaction implicitly open until the cursor is destroyed and preventing rows from being cleaned up.

Additionally, if you're using cursors you can't hand connections back to a connection pool. You'll need one connection per client. That means more backend resources are used just maintaining session state, and sets a very real upper limit on the number of clients you can handle with a cursor-based approach.

There's also the complexity and overhead of managing a stateful, cursor-based setup as compared to a stateless connection-pooling approach with limit and offset. You need to have your application expire cursors after a timeout or you face potentially unbounded resource use on the server, and you need to keep track of which connections have which cursors for which result sets for which users....

In general, despite the fact that it can be quite inefficient, LIMIT and OFFSET can be the better solution. It can often be better to search the primary key rather than using OFFSET, though.

By the way, you were looking at the documentation for cursors in PL/pgSQL. You want normal SQL-level cursors for this job.


Do the cursors require that a database connection be left open?

Yes.

Do the cursors run inside a transaction, locking resources until they are "closed"?

Yes unless they are WITH HOLD, in which case they consume other database resources.

Are there any other "gotchas" that I'm not aware of?

Yes, as the above should explain.

这篇关于使用“光标"用于 PostgreSQL 中的分页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆