使用“游标"在PostgreSQL中进行分页 [英] Using "Cursors" for paging in PostgreSQL

查看:177
本文介绍了使用“游标"在PostgreSQL中进行分页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能重复:
如何为API客户端提供1,000,000数据库结果?

光标的使用感到好奇是一个好方法使用PostgreSQL实现分页".

用例是,我们有多达100,000行希望提供给我们的API客户端.我们认为实现这一目标的一种好方法是允许客户分批请求信息(页面).客户端一次可以请求100行.我们将返回100行以及一个游标,然后当客户端准备就绪时,他们可以使用发送给他们的游标来请求下100行.

但是,我对游标的工作方式以及使用游标的方式和时间有些困惑:

  • 游标是否要求打开数据库连接?
  • 游标是否在事务内运行,从而锁定资源,直到它们被关闭"?
  • 还有我不知道的其他陷阱"吗?
  • 还有另一种更好的方法来处理这种情况吗?

非常感谢!

解决方案

对于在处理大型数据集的较小Intranet应用程序中进行分页,游标是一个合理的选择,但是您需要准备在超时后将其丢弃.用户喜欢闲逛,去吃午餐,度假两个星期等等,然后让他们的应用程序运行.如果它是一个基于Web的应用程序,那么甚至还会出现正在运行"是什么以及如何判断用户是否还在的问题.

它们不适合具有高客户端数量的大型应用程序以及像基于Web的应用程序或Web API中那样来来往往随机的客户端.我不建议您在应用程序中使用游标,除非您的客户端数量很少且请求率很高...在这种情况下,发送少量的行将非常低效,您应该考虑允许范围请求等. /p>

游标有几个成本.如果光标不是WITH HOLD,则必须保持事务打开.打开的事务会阻止自动清理正常工作,从而导致表膨胀和其他问题.如果将游标声明为WITH HOLD并且未将事务保持打开状态,则您必须支付实现和存储潜在的较大结果集的成本-至少,我认为这是保持游标的工作方式.替代方案也一样糟糕,保持隐式打开事务直到游标被销毁,并防止行被清除.

此外,如果使用游标,则无法将连接交还给连接池.每个客户端需要一个连接.这意味着更多的后端资源仅用于维护会话状态,并且为基于游标的方法可以处理的客户端数量设置了非常实际的上限.

与具有限制和偏移量的无状态连接池方法相比,管理基于状态的,基于游标的设置还存在复杂性和开销.您需要让应用程序在超时后使游标失效,否则您将面临服务器上潜在的无限资源使用,并且需要跟踪哪些连接具有哪些游标,哪些结果集针对哪些用户....

通常,尽管效率可能很低,但LIMITOFFSET可能是更好的解决方案. 搜索主键通常比使用OFFSET更好,.

顺便说一句,您正在查看PL/pgSQL中的游标文档.您想要普通的SQL级游标. >


游标是否要求打开数据库连接?

是的

游标是否在事务内运行,锁定资源,直到它们 是关闭"?

是,除非它们是WITH HOLD,否则将消耗其他数据库资源.

还有我不知道的其他陷阱"吗?

是的,如上所述.

Possible Duplicate:
How to provide an API client with 1,000,000 database results?

Wondering of the use of Cursors is a good way to implement "paging" using PostgreSQL.

The use case is that we have upwards 100,000 rows that we'd like to make available to our API clients. We thought a good way to make this happen would be to allow the client to request the information in batches ( pages ). The client could request 100 rows at a time. We would return the 100 rows as well as a cursor, then when the client was ready, they could request the next 100 rows using the cursor that we sent to them.

However, I'm a little hazy on how cursors work and exactly how and when cursors should be used:

  • Do the cursors require that a database connection be left open?
  • Do the cursors run inside a transaction, locking resources until they are "closed"?
  • Are there any other "gotchas" that I'm not aware of?
  • Is there another, better way that this situation should be handled?

Thanks so much!

解决方案

Cursors are a reasonable choice for paging in smaller intranet applications that work with large data sets, but you need to be prepared to discard them after a timeout. Users like to wander off, go to lunch, go on holiday for two weeks, etc, and leave their applications running. If it's a web-based app there's even the question of what "running" is and how to tell if the user is still around.

They are not suitable for large-scale applications with high client counts and clients that come and go near-randomly like in web-based apps or web APIs. I would not recommend using cursors in your application unless you have a fairly small client count and very high request rates ... in which case sending tiny batches of rows will be very inefficient and you should think about allowing range-requests etc instead.

Cursors have several costs. If the cursor is not WITH HOLD you must keep a transaction open. The open transaction can prevent autovacuum from doing its work properly, causing table bloat and other issues. If the cursor is declared WITH HOLD and the transaction isn't held open you have to pay the cost of materializing and storing a potentially large result set - at least, I think that's how hold cursors work. The alternative is just as bad, keeping the transaction implicitly open until the cursor is destroyed and preventing rows from being cleaned up.

Additionally, if you're using cursors you can't hand connections back to a connection pool. You'll need one connection per client. That means more backend resources are used just maintaining session state, and sets a very real upper limit on the number of clients you can handle with a cursor-based approach.

There's also the complexity and overhead of managing a stateful, cursor-based setup as compared to a stateless connection-pooling approach with limit and offset. You need to have your application expire cursors after a timeout or you face potentially unbounded resource use on the server, and you need to keep track of which connections have which cursors for which result sets for which users....

In general, despite the fact that it can be quite inefficient, LIMIT and OFFSET can be the better solution. It can often be better to search the primary key rather than using OFFSET, though.

By the way, you were looking at the documentation for cursors in PL/pgSQL. You want normal SQL-level cursors for this job.


Do the cursors require that a database connection be left open?

Yes.

Do the cursors run inside a transaction, locking resources until they are "closed"?

Yes unless they are WITH HOLD, in which case they consume other database resources.

Are there any other "gotchas" that I'm not aware of?

Yes, as the above should explain.

这篇关于使用“游标"在PostgreSQL中进行分页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆