Oracle分页策略 [英] Oracle Pagination strategy

查看:90
本文介绍了Oracle分页策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从两个时间戳之间的表中获取一百万行,然后对其进行处理.一次触发一个查询并立即检索所有记录似乎是一个错误的策略,因为它可能超出了我的JVM程序的内存功能.

I want to fetch million of rows from a table between two timestamps and then do processing over it. Firing a single query and retrieving all the records at once looks to be a bad strategy as it might exceed the memory capabilities of my JVM program.

我已经阅读了这篇文章:

I have read this article:

http://oracle.readthedocs.io /en/latest/sql/indexes/top-n-pagination.html

因此,我计划分1000进行分页,并提出了以下策略: 假设Start_Date = X and End_Date = Y

Hence, I am planning to do pagination in batches of 1000 and have come up with following strategy: Let's say Start_Date = X and End_Date = Y

  1. 触发查询,

select * from table where CREATE_TIMESTAMP > X AND CREATE_TIMESTAMP < Y ORDER BY CREATE_TIMESTAMP FETCH NEXT 1000 ROWS ONLY.

  1. 如果仅获得少于1000行,则意味着所有记录均已完成.如果我恰好得到1000行,那意味着可能会有更多的记录.

  1. If I get less than 1000 rows only, that means all records have been completed. If I get exactly 1000 rows, that means, there might be more records.

set X = CREATE_TIMESTAMP of 1000th record

select * from table where CREATE_TIMESTAMP > X AND CREATE_TIMESTAMP < Y ORDER BY CREATE_TIMESTAMP FETCH NEXT 1000 ROWS ONLY

这一直重复到我得到少于1000条记录为止.

This repeats until I get less than 1000 records.

有人看到这种方法有什么问题吗?

Does anyone see any issues with this approach?

推荐答案

分页模式是为了网站展示(与滚动导航相反)而发明的,在此效果最好.简而言之,实时用户实际上无法一次查看成千上万条记录,因此该信息被分成短页(50〜200条记录),其中通常每页向数据库发送一个查询.用户通常仅单击几页,但不会浏览所有页面,此外,用户需要花费一些时间来浏览页面,因此查询不会一一发送到数据库,而是间隔很长的时间.检索数据块的时间比检索所有数百万条记录的时间要短得多,因此用户很高兴,因为他不必等待很长的时间来等待后续页面,而且整个系统的负载也较小.

Pagination pattern has been invented for the purpose of websites presentation (in opposite to scrolling navigation), and works best there. In short, the live user is practically unable to view thousands/millions of records at once, so the information is divided into short pages (50~200 records), where one query is usually sent to the database for each page. The user usually clicks on a few pages only, but does not browse all of them, in addition the user needs a bit of time to browse the page, so the queries are not sent to the database one by one, but in long intervals. The time to retrieve a chunk of data is much shorter than retrieving all millions of record, so the user is happy because he does not have to wait long for subsequent pages, and the overall system load is smaller.

但是从这个问题来看,您的应用程序的性质似乎是针对批处理的,而不是针对网络演示的.应用程序必须获取所有记录,并对每个记录进行一些操作/转换(计算).在这种情况下,将使用完全不同的设计模式(流/流水线处理,步骤顺序,并行步骤/操作等),并且分页将不起作用,如果您采用这种方式,则会破坏系统性能.

But it seems from the question that the nature of your application is oriented to batch processing rather than to the web presentation. The application must fetch all records and do some operations/transformations (calculations) on each of the records. In this case , completely different design patterns are used (stream/pipelined processing, sequence of steps, parallel steps/operations etc), and pagination will not work, if you go that way you will kill your system performance.

代替花哨的理论,我们来看一个简单而实际的示例,该示例将向您展示我们在此处讨论的速度差异

假设有一个表PAGINATION包含大约700万条记录:

Instead of fancy theory, let's look at simple and practical example which will show you what differences in speed we are talking here

Let say there is a table PAGINATION with about 7 millions of records:

create table pagination as
select sysdate - 200 * dbms_random.value As my_date, t.*
from (
    select o.* from all_objects o 
    cross join (select * from dual connect by level <= 100)
    fetch first 10000000 rows only
) t;

select count(*) from pagination;

  COUNT(*)
----------
   7369600


假设在MY_DATE列上创建了一个索引,并且索引统计信息是最新的:


Let say there is an index created on MY_DATE column, and index statistics are fresh:

create index PAGINATION_IX on pagination( my_date );

BEGIN dbms_stats.gather_table_stats( 'TEST', 'PAGINATION', method_opt => 'FOR ALL COLUMNS' ); END;
/

让我们说,在以下日期之间,我们将处理表中约10%的记录:

Let say that we are going to process about 10% of records from the table between the below dates:

select count(*) from pagination
where my_date between date '2017-10-01' and '2017-10-21';

  COUNT(*)
----------
    736341


最后要说的是,为简单起见,我们的处理"将包括对字段之一的长度进行简单的求和.
这是一个简单的分页实现:


and finally let say that our "processing" for simplicity, will consist in simple summing of lengths of one of field.
This is a simple paging implementation:

public class Pagination {

    public static class RecordPojo {
        Date myDate;
        String objectName;

        public Date getMyDate() {
            return myDate;
        }
        public RecordPojo setMyDate(Date myDate) {
            this.myDate = myDate;
            return this;
        }
        public String getObjectName() {
            return objectName;
        }
        public RecordPojo setObjectName(String objectName) {
            this.objectName = objectName;
            return this;
        }
    };

    static class MyPaginator{

        private Connection conn;
        private int pageSize;
        private int currentPage = 0;

        public MyPaginator( Connection conn, int pageSize ) {
            this.conn = conn;
            this.pageSize = pageSize;
        }

        static final String QUERY = ""
                + "SELECT my_date, object_name FROM pagination "
                + "WHERE my_date between date '2017-10-01' and '2017-10-21' "
                + "ORDER BY my_date "
                + "OFFSET ? ROWS FETCH NEXT ? ROWS ONLY";

        List<RecordPojo> getNextPage() {
            List<RecordPojo> list = new ArrayList<>();
            ResultSet rs = null;
            try( PreparedStatement ps = conn.prepareStatement(QUERY);) {
                ps.setInt(1, pageSize * currentPage++ );
                ps.setInt(2,  pageSize);
                rs = ps.executeQuery();

                while( rs.next()) {
                    list.add( new RecordPojo().setMyDate(rs.getDate(1)).setObjectName(rs.getString(2)));
                }

            } catch (SQLException e) {
                e.printStackTrace();
            }finally {
                try{rs.close();}catch(Exception e) {}
            }
            return list;
        }

        public int getCurrentPage() {
            return currentPage;
        }
    }


    public static void main(String ...x) throws SQLException {
        OracleDataSource ds = new OracleDataSource();
        ds.setURL("jdbc:oracle:thin:test/test@//localhost:1521/orcl");
        long startTime = System.currentTimeMillis();
        long value = 0;
        int pageSize = 1000;

        try( Connection conn = ds.getConnection();){
            MyPaginator p = new MyPaginator(conn, pageSize);
            List<RecordPojo> list;
            while( ( list = p.getNextPage()).size() > 0 ) {
                value += list.stream().map( y -> y.getObjectName().length()).mapToLong(Integer::longValue).sum();
                System.out.println("Page: " + p.getCurrentPage());
            }
            System.out.format("==================\nValue = %d, Pages = %d,  time = %d seconds", value, p.getCurrentPage(), (System.currentTimeMillis() - startTime)/1000);
        }
    }
}


结果是:


A result is:

Value = 18312338, Pages = 738,  time = 2216 seconds


现在让我们测试一个非常简单的基于流的解决方案-仅获取一条记录,对其进行处理,将其丢弃(释放内存),然后获取下一条.


Now let's test a very simple stream based solution - just take only one record, process it, discard it (freeing up memory), and take the next one.

public class NoPagination {

    static final String QUERY = ""
            + "SELECT my_date, object_name FROM pagination "
            + "WHERE my_date between date '2017-10-01' and '2017-10-21' "
            + "ORDER BY my_date ";

    public static void main(String[] args) throws SQLException {
        OracleDataSource ds = new OracleDataSource();
        ds.setURL("jdbc:oracle:thin:test/test@//localhost:1521/orcl");
        long startTime = System.currentTimeMillis();
        long count = 0;

        ResultSet rs = null;
        PreparedStatement ps = null;
        try( Connection conn = ds.getConnection();){
            ps = conn.prepareStatement(QUERY);
            rs = ps.executeQuery();
            while( rs.next()) {
                // processing
                RecordPojo r = new RecordPojo().setMyDate(rs.getDate(1)).setObjectName(rs.getString(2)); 
                count+=r.getObjectName().length();
            }
            System.out.format("==================\nValue = %d, time = %d seconds", count, (System.currentTimeMillis() - startTime)/1000);
        }finally {
            try { rs.close();}catch(Exception e) {}
            try { ps.close();}catch(Exception e) {}
        }
    }


结果是:


A result is:

Value = 18312328, time = 11 seconds

是-2216秒/11秒=快201倍-快20 100%!!!
逆天 ?您可以自己进行测试.
这个例子说明了选择正确的解决方案(正确的设计模式)来解决问题的重要性.

Yes - 2216 seconds / 11 seconds = 201 times faster - 20 100 % faster !!!
Unbelievable ? You can test it yourself.
This example shows how important it is to choose the right solution (right design patterns) to solve the problem.

这篇关于Oracle分页策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆