SQL LIMIT与JDBC语句setMaxRows.哪一个更好? [英] SQL LIMIT vs. JDBC Statement setMaxRows. Which one is better?
问题描述
我想选择给定查询的前10条记录.因此,我可以使用以下选项之一:
I want to select the Top 10 records for a given query. So, I can use one of the following options:
- 使用JDBC
Statement.setMaxRows()
方法 - 在SQL查询中使用LIMIT和OFFSET
这两种选择的优点和缺点是什么?
What are the advantages and disadvantages of these two options?
推荐答案
SQL级限制
要限制SQL查询结果集的大小,可以使用SQL:008语法:
SQL-level LIMIT
To restrict the SQL query result set size, you can use the SQL:008 syntax:
SELECT title
FROM post
ORDER BY created_on DESC
OFFSET 50 ROWS
FETCH NEXT 50 ROWS ONLY
可在Oracle 12,SQL Server 2012或PostgreSQL 8.4或更高版本上使用.
which works on Oracle 12, SQL Server 2012, or PostgreSQL 8.4 or newer versions.
对于MySQL,您可以使用LIMIT和OFFSET子句:
For MySQL, you can use the LIMIT and OFFSET clauses:
SELECT title
FROM post
ORDER BY created_on DESC
LIMIT 50
OFFSET 50
使用SQL级分页的优点是数据库执行计划可以使用此信息.
The advantage of using the SQL-level pagination is that the database execution plan can use this information.
因此,如果我们在created_on
列上有一个索引:
So, if we have an index on the created_on
column:
CREATE INDEX idx_post_created_on ON post (created_on DESC)
然后我们执行使用LIMIT
子句的以下查询:
And we execute the following query that uses the LIMIT
clause:
EXPLAIN ANALYZE
SELECT title
FROM post
ORDER BY created_on DESC
LIMIT 50
我们可以看到数据库引擎使用了索引,因为优化器知道只提取50条记录:
We can see that the database engine uses the index since the optimizer knows that only 50 records are to be fetched:
Execution plan:
Limit (cost=0.28..25.35 rows=50 width=564)
(actual time=0.038..0.051 rows=50 loops=1)
-> Index Scan using idx_post_created_on on post p
(cost=0.28..260.04 rows=518 width=564)
(actual time=0.037..0.049 rows=50 loops=1)
Planning time: 1.511 ms
Execution time: 0.148 ms
JDBC语句maxRows
根据 setMaxRows
Javadoc :
JDBC Statement maxRows
According to the setMaxRows
Javadoc:
如果超出限制,多余的行将被静默删除.
If the limit is exceeded, the excess rows are silently dropped.
那不是很让人放心!
因此,如果我们在PostgreSQL上执行以下查询:
So, if we execute the following query on PostgreSQL:
try (PreparedStatement statement = connection
.prepareStatement("""
SELECT title
FROM post
ORDER BY created_on DESC
""")
) {
statement.setMaxRows(50);
ResultSet resultSet = statement.executeQuery();
int count = 0;
while (resultSet.next()) {
String title = resultSet.getString(1);
count++;
}
}
我们在PostgreSQL日志中获得以下执行计划:
We get the following execution plan in the PostgreSQL log:
Execution plan:
Sort (cost=65.53..66.83 rows=518 width=564)
(actual time=4.339..5.473 rows=5000 loops=1)
Sort Key: created_on DESC
Sort Method: quicksort Memory: 896kB
-> Seq Scan on post p (cost=0.00..42.18 rows=518 width=564)
(actual time=0.041..1.833 rows=5000 loops=1)
Planning time: 1.840 ms
Execution time: 6.611 ms
因为数据库优化器不知道我们只需要获取50条记录,所以它假定所有5000行都需要扫描.如果查询需要获取大量记录,则全表扫描的成本实际上比使用索引的成本低,因此执行计划将完全不使用索引.
Because the database optimizer has no idea that we need to fetch only 50 records, it assumes that all 5000 rows need to be scanned. If a query needs to fetch a large number of records, the cost of a full-table scan is actually lower than if an index is used, hence the execution plan will not use the index at all.
我在Oracle,SQL Server,PostgreSQL和MySQL上进行了此测试,看起来Oracle和PostgreSQL优化器在生成执行计划时未使用
maxRows
设置.
但是,在SQL Server和MySQL上,考虑了maxRows
JDBC设置,执行计划等效于使用TOP
或LIMIT
的SQL查询.您可以自己运行这些测试,因为这些测试在我的.
However, on SQL Server and MySQL, the maxRows
JDBC setting is taken into consideration, and the execution plan is equivalent to an SQL query that uses TOP
or LIMIT
. You can run the tests for yourself, as they are available in my High-Performance Java Persistence GitHub repository.
结论
尽管setMaxRows
似乎是限制ResultSet
大小的可移植解决方案,但是如果数据库服务器优化器不使用JDBC maxRows
属性,则SQL级别的分页会更加有效.
Conclusion
Although it looks like the setMaxRows
is a portable solution to limit the size of the ResultSet
, the SQL-level pagination is much more efficient if the database server optimizer doesn't use the JDBC maxRows
property.
有关如何使用Oracle,SQL Server,PostgreSQL和MySQL编写Top-N SQL查询的更多详细信息,请查看 查看全文