通过更改排序顺序处理分页 [英] Handling paging with changing sort orders

查看:133
本文介绍了通过更改排序顺序处理分页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个RESTful Web服务(在Golang中),它从数据库中提取一组行并将其返回给客户端(智能手机应用程序或Web应用程序)。该服务需要能够提供分页。唯一的问题是这些数据是按照定期更改的计算列进行排序的(例如,网站上的某个内容的竖起大拇指或竖起大拇指的数量),因此行可以跳过页码在客户请求之间



我已经查看了一些PostgreSQL的功能,可以用来帮助我解决这个问题,但没有任何功能实际上它似乎是一个非常好的解决方案。




  • 实体化视图:保存仅仅每隔一段时间更新一次的陈旧数据。这并不能真正解决问题,因为如果用户恰好在更新物化视图时分页数据,数据仍然会跳转。

  • 游标:为每个客户端创建会话并在请求之间进行。这看起来像是一场噩梦,如果一次有很多并发会话(会有)。



有没有人对于如何处理这个问题有什么建议,无论是在客户端还是数据库端?有什么我可以真正做的,或者像这样的问题,通常只是消费数据的客户端补救?



编辑:我应该提到智能手机应用程序允许用户通过无限滚动查看更多数据,因此它可以跟踪它自己的客户端数据列表。

解决方案

这是一个没有完美解决方案的问题,因为您试图将基本上不兼容的要求结合在一起:


  • 只将所需数量的数据发送到客户端,即您无法下载整个数据集,然后在客户端进行分页。


  • 为每个客户端维护不同的状态

    为每个客户端维护不同的状态

    为每个客户端维护不同的状态

  • 这是一种挑选任何两种情况。你必须妥协;接受你不能保持每个客户的分页状态正确,接受你必须下载一个大数据集给客户端,或者接受你必须使用大量服务器资源来维护客户端状态。



    这些混合了各种折衷的内容有所不同,但这就是所有的折衷。



    例如,一些人们会向客户端发送一些额外的数据,足以满足大多数客户的需求。如果客户端超出了这个范围,那么它会被分页。



    有些系统会在短时间内缓存客户端状态(使用短暂的未记录的表,临时文件或其他)但是很快就会过期,所以如果客户不是经常要求提供新的数据,那么它会被分页。



    等。

    另见:



    我可能实现了某种形式的混合解决方案,如:


    • 使用游标读取并立即将第一部分数据发送到客户端。
      立即从游标中获取足够的额外数据满足99%的客户需求。将它存储到一个像memcached,Redis,BigMemory,EHCache这样的快速,不安全的缓存中,无论在哪个密钥下,都可以让我检索它以便稍后由同一个客户端请求。然后关闭游标以释放数据库资源。
    • 在最近最少使用的基础上过期缓存,因此如果客户端不能快速读取足够他们必须从数据库中获得一组新的数据,并且分页改变。

    • 如果客户想要获得比其绝大多数结果更多的结果当你切换到直接从数据库而不是缓存读取数据或者生成一个新的更大的缓存数据集时,分页会发生变化。
    b $ b

    这样大多数客户不会注意到分页问题,​​也不必向大多数客户端发送大量数据,但是您不会融化数据库服务器。但是,您需要一个大的缓存来缓解这一点。它的实际取决于你的客户是否可以应付分页打破 - 如果打破分页是不可接受的,那么你就会在数据库一侧使用游标,临时表格,在第一次请求时应对整个结果集等等。它还取决于数据集的大小以及每个客户通常需要多少数据。

    I'm creating a RESTful web service (in Golang) which pulls a set of rows from the database and returns it to a client (smartphone app or web application). The service needs to be able to provide paging. The only problem is this data is sorted on a regularly changing "computed" column (for example, the number of "thumbs up" or "thumbs down" a piece of content on a website has), so rows can jump around page numbers in between a client's request.

    I've looked at a few PostgreSQL features that I could potentially use to help me solve this problem, but nothing really seems to be a very good solution.

    • Materialized Views: to hold "stale" data which is only updated every once in a while. This doesn't really solve the problem, as the data would still jump around if the user happens to be paging through the data when the Materialized View is updated.
    • Cursors: created for each client session and held between requests. This seems like it would be a nightmare if there are a lot of concurrent sessions at once (which there will be).

    Does anybody have any suggestions on how to handle this, either on the client side or database side? Is there anything I can really do, or is an issue such as this normally just remedied by the clients consuming the data?

    Edit: I should mention that the smartphone app is allowing users to view more pieces of data through "infinite scrolling", so it keeps track of it's own list of data client-side.

    解决方案

    This is a problem without a perfectly satisfactory solution because you're trying to combine essentially incompatible requirements:

    • Send only the required amount of data to the client on-demand, i.e. you can't download the whole dataset then paginate it client-side.

    • Minimise amount of per-client state that the server must keep track of, for scalability with large numbers of clients.

    • Maintain different state for each client

    This is a "pick any two" kind of situation. You have to compromise; accept that you can't keep each client's pagination state exactly right, accept that you have to download a big data set to the client, or accept that you have to use a huge amount of server resources to maintain client state.

    There are variations within those that mix the various compromises, but that's what it all boils down to.

    For example, some people will send the client some extra data, enough to satisfy most client requirements. If the client exceeds that, then it gets broken pagination.

    Some systems will cache client state for a short period (with short lived unlogged tables, tempfiles, or whatever), but expire it quickly, so if the client isn't constantly asking for fresh data its gets broken pagination.

    Etc.

    See also:

    I'd probably implement a hybrid solution of some form, like:

    • Using a cursor, read and immediately send the first part of the data to the client.

    • Immediately fetch enough extra data from the cursor to satisfy 99% of clients' requirements. Store it to a fast, unsafe cache like memcached, Redis, BigMemory, EHCache, whatever under a key that'll let me retrieve it for later requests by the same client. Then close the cursor to free the DB resources.

    • Expire the cache on a least-recently-used basis, so if the client doesn't keep reading fast enough they have to go get a fresh set of data from the DB, and the pagination changes.

    • If the client wants more results than the vast majority of its peers, pagination will change at some point as you switch to reading direct from the DB rather than the cache or generate a new bigger cached dataset.

    That way most clients won't notice pagination issues and you don't have to send vast amounts of data to most clients, but you won't melt your DB server. However, you need a big boofy cache to get away with this. Its practical depends on whether your clients can cope with pagination breaking - if it's simply not acceptable to break pagination, then you're stuck with doing it DB-side with cursors, temp tables, coping the whole result set at first request, etc. It also depends on the data set size and how much data each client usually requires.

    这篇关于通过更改排序顺序处理分页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆