Order BY将30毫秒的查询转换为7120毫秒的查询。已知的性能问题? [英] Order BY turns a 30ms query into a 7120ms query. Known performance issue?

查看:118
本文介绍了Order BY将30毫秒的查询转换为7120毫秒的查询。已知的性能问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含100万条记录的用户表:

I have a User table with 1m records:

User (id, fname, lname, deleted_at, guest)

我有一个针对postgres 9.1 db的查询:

I have the following query which is being run against a postgres 9.1 db:

SELECT "users".* 
FROM "users" 
WHERE (users.deleted_at IS NULL) AND (SUBSTRING(lower(fname), 1, 1) = 's') 
ORDER BY guest = false, fname ASC 
LIMIT 25 OFFSET 0

使用pgAdmin 3,此SQL需要 7120ms 返回25行。如果我删除 ORDER BY guest = false,fname ASC,查询仅需 31ms

Using pgAdmin 3, this SQL is taking 7120ms to return 25 rows. If I remove the 'ORDER BY guest = false, fname ASC' the query takes just 31ms.

我有以下索引:

add_index "users", ["fname"], :name => "index_users_on_fname"
add_index "users", ["guest", "fname"], :name => "index_users_on_guest_and_fname"
add_index "users", ["deleted_at"], :name => "index_users_on_deleted_at"
add_index "users", ["guest"], :name => "index_users_on_guest"

有什么想法吗?谢谢!

已解释更新

"Limit  (cost=43541.55..43541.62 rows=25 width=1612) (actual time=1276.777..1276.783 rows=25 loops=1)"
"  ->  Sort  (cost=43541.55..43558.82 rows=6905 width=1612) (actual time=1276.775..1276.777 rows=25 loops=1)"
"        Sort Key: ((NOT guest)), fname"
"        Sort Method: top-N heapsort  Memory: 37kB"
"        ->  Seq Scan on users  (cost=0.00..43346.70 rows=6905 width=1612) (actual time=5.143..1272.563 rows=475 loops=1)"
"              Filter: ((deleted_at IS NULL) AND pubic_profile_visible AND ((fname)::text ~~ 's%'::text))"
"Total runtime: 1276.967 ms"


推荐答案

首先,由于PostgreSQL 9.1,您可以使用 left() 来简化表达式:

First, since PostgreSQL 9.1 you can use left() to simplify the expression:

substring(lower(fname), 1, 1)
lower(left(fname, 1)) -- equivalent, but simpler and faster

将第一个字符 强制转换为小写字母也稍快一些。

接下来,清理查询:

Also slightly faster to take the first character before casting to lower case.
Next, clean up the query:

SELECT * 
FROM   users 
WHERE  deleted_at IS NULL
AND    lower(left(fname, 1)) = 's'
ORDER  BY guest DESC NULLS LAST, fname
LIMIT  25 OFFSET 0;

guest DESC NULLS LAST 的结果与 guest =否,而无需为每一行计算一个新值。

接下来,创建此多列部分索引

guest DESC NULLS LAST results in the same as guest = FALSE, just without calculating a new value for every row.
Next, create this multi-column partial index:

CREATE INDEX users_multi_idx
ON users (lower(left(fname, 1)), guest DESC NULLS LAST, fname)
WHERE deleted_at IS NULL;

运行

ANALYZE users;

或者甚至更好的是, CLUSTER (如果您没有更重要的查询需要不同的顺序) -然后然后 分析

Or, even better, CLUSTER (if you don't have more important queries requiring a different order) - and then ANALYZE:

CLUSTER users using users_multi_idx;

这将比您之前尝试过的任何方法都快。因为现在,该查询从索引中顺序读取行,并且该表已经以相同的顺序进行了物理重写,从而导致只有很少的页面命中...

And it will be way faster than anything you tried before. Because now, the query reads rows from the index sequentially and the table has been physically rewritten in the same order, resulting in only few page hits ...

这篇关于Order BY将30毫秒的查询转换为7120毫秒的查询。已知的性能问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆