偏移量与ROW_NUMBER() [英] OFFSET vs. ROW_NUMBER()

查看:91
本文介绍了偏移量与ROW_NUMBER()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们知道,Postgresql的OFFSET要求它扫描所有行,直到到达您请求的位置为止,这使得它无法通过庞大的结果集进行分页,而随着OFFSET的增加,它变得越来越慢

As we know, Postgresql's OFFSET requires that it scan through all the rows up until the point it gets to where you requested, which makes it kind of useless for pagination through huge result sets, getting slower and slower as the OFFSET goes up.

PG 8.4现在支持窗口功能。而不是:

PG 8.4 now supports window functions. Instead of:

SELECT * FROM table ORDER BY somecol LIMIT 10 OFFSET 500

您可以说:

SELECT * FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY somecol ASC) AS rownum FROM table) AS foo
WHERE rownum > 500 AND rownum <= 510

后一种方法对我们有帮助吗?还是我们必须继续使用标识列和临时表进行大分页?

Does the latter approach help us at all ? Or do we have to keep using identifying columns and temp tables for large pagination ?

推荐答案

我已经构建了一个可比较OFFSET的测试,游标和ROW_NUMBER()。我对ROW_NUMBER()的印象是正确的,而不管您在结果集中的位置如何。但是,该速度远比OFFSET或CURSOR慢,这也是我的印象,它们的速度几乎相同,两者都随着速度的降低而降低,直至达到最终结果。

I've constructed a test which compares OFFSET, cursors, and ROW_NUMBER(). My impression of ROW_NUMBER(), that it would be consistent in speed regardless of where you are in the result set, is correct. However, that speed is dramatically slower than either OFFSET or CURSOR, which, as was also my impression, are pretty much the same in speed, both degrading in speed the further out to the end of the result you go.

结果:

offset(100,100): 0.016359
scroll(100,100): 0.018393
rownum(100,100): 15.535614

offset(100,480000): 1.761800
scroll(100,480000): 1.781913
rownum(100,480000): 15.158601

offset(100,999900): 3.670898
scroll(100,999900): 3.664517
rownum(100,999900): 14.581068

测试脚本使用sqlalchemy设置表和1000000行测试数据。然后,它使用psycopg2游标执行每个SELECT语句,并使用三种不同的方法获取结果。

The test script uses sqlalchemy to set up tables and 1000000 rows of test data. It then uses a psycopg2 cursor to execute each SELECT statement and fetch results with the three different methods.

from sqlalchemy import *

metadata = MetaData()
engine = create_engine('postgresql://scott:tiger@localhost/test', echo=True)

t1 = Table('t1', metadata,
    Column('id', Integer, primary_key=True),
    Column('d1', String(50)),
    Column('d2', String(50)),
    Column('d3', String(50)),
    Column('d4', String(50)),
    Column('d5', String(50))
)

if not engine.has_table('t1'):
    conn = engine.connect()
    t1.create(conn)

    # 1000000 rows
    for i in range(100):
        conn.execute(t1.insert(), [
            dict(
                ('d%d' % col, "data data data %d %d" % (col, (i * 10000) + j))
                for col in range(1, 6)
            ) for j in xrange(1, 10001)
        ])

import time

def timeit(fn, count, *args):
    now = time.time()
    for i in xrange(count):
        fn(*args)
    total = time.time() - now
    print "%s(%s): %f" % (fn.__name__, ",".join(repr(x) for x in args), total)

# this is a raw psycopg2 connection.
conn = engine.raw_connection()

def offset(limit, offset):
    cursor = conn.cursor()
    cursor.execute("select * from t1 order by id limit %d offset %d" % (limit, offset))
    cursor.fetchall()
    cursor.close()

def rownum(limit, offset):
    cursor = conn.cursor()
    cursor.execute("select * from (select *, "
                    "row_number() over (order by id asc) as rownum from t1) as foo "
                    "where rownum>=%d and rownum<%d" % (offset, limit + offset))
    cursor.fetchall()
    cursor.close()

def scroll(limit, offset):
    cursor = conn.cursor('foo')
    cursor.execute("select * from t1 order by id")
    cursor.scroll(offset)
    cursor.fetchmany(limit)
    cursor.close()

print 

timeit(offset, 10, 100, 100)
timeit(scroll, 10, 100, 100)
timeit(rownum, 10, 100, 100)

print 

timeit(offset, 10, 100, 480000)
timeit(scroll, 10, 100, 480000)
timeit(rownum, 10, 100, 480000)

print 

timeit(offset, 10, 100, 999900)
timeit(scroll, 10, 100, 999900)
timeit(rownum, 10, 100, 999900)

这篇关于偏移量与ROW_NUMBER()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆