提高首次查询的性能 [英] Improve performance of first query

查看:94
本文介绍了提高首次查询的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果执行以下数据库(postgres)查询,则第二次调用会更快。



我猜由于操作系统(linux),第一个查询的速度很慢需要从磁盘获取数据。第二个查询得益于文件系统级和postgres中的缓存。



是否有一种方法可以优化数据库,从而在第一个上快速获得结果。 >通话?



首次通话(慢)

  foo3_bar_p @ BAR-FOO3-Test:〜$ psql 

foo3_bar_p =#说明分析SELECT foo3_beleg。 id,...从 foo3_beleg在
foo3_bar_p -#(( foo3_beleg。 id IN(从foo3_text中选择beleg_id,其中
foo3_bar_p(#content @@'footown':: tsquery))和 foo3_beleg。 belegart_id IN
foo3_bar_p (#('...',...));
查询计划
------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------
嵌套厕所p(成本= 75314.58..121963.20行= 152宽度= 135)(实际时间= 27253.451..88462.165行= 11循环= 1)
-> HashAggregate(费用= 75314.58..75366.87行= 5229宽度= 4)(实际时间= 16087.345..16113.988行= 17671循环= 1)
->在foo3_text上进行位图堆扫描(cost = 273.72..75254.67行= 23964宽度= 4)(实际时间= 327.653..16026.787行= 27405循环= 1)
重新检查条件:(content @@'footown' '':: tsquery)
->在foo3_text_content_idx上进行位图索引扫描(成本= 0.00..267.73行= 23964宽度= 0)(实际时间= 281.909..281.909行= 27405循环= 1)
Index Cond:(content @@'footfoot' '':: tsquery)
->在foo3_beleg上使用foo3_beleg_pkey进行索引扫描(成本= 0.00..8.90行= 1宽度= 135)(实际时间= 4.092..4.092行= 0循环= 17671)
索引条件:(id = foo3_text.beleg_id)
过滤器:((belegart_id):: text = ANY('{...
过滤器删除的行:1
总运行时间:88462.809 ms
(11行)

二次通话(快速)

 嵌套循环(cost = 75314.58..121963.20行= 152宽度= 135)(实际时间= 127.569..348.705行= 11循环= 1)
-> HashAggregate(成本= 75314.58..75366.87行= 5229宽度= 4)(实际时间= 114.390..133.131行= 17671循环= 1)
->在foo3_text上进行位图堆扫描(成本= 273.72..75254.67行= 23964宽度= 4)(实际时间= 11.961..97.943行= 27405循环= 1)
重新检查条件:(content @@'''footown''':: tsquery)
->在foo3_text_content_idx上进行位图索引扫描(成本= 0.00..267.73行= 23964宽度= 0)(实际时间= 9.226..9。 226行= 27405循环= 1)
索引条件:(内容@@’足球’:: tsquery)
->在foo3_beleg上使用foo3_beleg_pkey进行索引扫描(成本= 0.00..8.90行= 1宽度= 135)(实际时间= 0.012..0.012行= 0循环= 17671)
索引条件:(id = foo3_text.beleg_id)
过滤器:((belegart_id):: text = ANY('...
过滤器删除的行:1
总运行时间:348.833毫秒
(11行)

foo3_text表的表布局(28M行)

  foo3_egs_p =#\d foo3_text 
表 public.foo3_text
列|类型|修饰符
---------- + ----------------------- + -------------------------- ----------------------------------
id |整数|不为空默认nextval('foo3_text_id_seq' :: regregclass)
beleg_id |整数|不为空
index_id |字符变化(32)|不为空
的值|文本|不为空
的内容| tsvector |
索引:
foo3_text_pkey主键,btree(id)
foo3_text_index_id_2685e3637668d5e7_uniq唯一约束,btree(index_id,beleg_id)
foo3_text_beleg_id)btree(belly b foo3_text_content_idx杜松子酒(内容)
foo3_text_index_id btree(index_id)
foo3_text_index_id_like btree(index_id varchar_pattern_ops)
外键约束:
beleg_id_refs_id_e_e6_e6(e40) beleg_id)参考foo3_beleg(id)最初可延迟
index_id_refs_name_341600137465c2f9外键(index_id)参考foo3_index(name)可初始延迟

$ b

可以更改硬件(SSD代替传统磁盘)或RAM磁盘。但是也许也许当前的硬件也可以实现更快的结果。



版本:x86_64-unknown-linux-gnu

上的PostgreSQL 9.1.2

如果您需要更多详细信息,请发表评论。

解决方案

Postgres为您提供了一个机会在执行运行时查询时进行一些配置,以确定您的I / O操作优先级。



random_page_cost(浮点数) -(参考)是有什么可以帮助您的。



较高的值表示I / O很重要,因为我有顺序的磁盘。较低的值表示I / O不重要,我有随机访问磁盘。



默认值为 4.0 ,也许您想增加并测试查询是否需要更短的时间。



别忘了,您的I / O优先级取决于列数和行数



一个很大的按钮;由于索引为btree,因此CPU优先级下降的速度比I / O优先级上升的速度快得多。您基本上可以将复杂度映射到优先级。

  CPU优先级= O(log(x))
I / O优先级= O(x)

总而言之,这意味着,如果Postgre的值 4.0 用于 100k 项,则应将其设置为(大约)(4.0 * log(100k)* 10M )/(log(10M)* 100k)用于 10M 条目。


If the following database (postgres) queries are executed, the second call is much faster.

I guess the first query is slow since the operating system (linux) needs to get the data from disk. The second query benefits from caching at filesystem level and in postgres.

Is there a way to optimize the database to get the results fast on the first call?

First call (slow)

foo3_bar_p@BAR-FOO3-Test:~$ psql

foo3_bar_p=# explain analyze SELECT "foo3_beleg"."id", ... FROM "foo3_beleg" WHERE 
foo3_bar_p-# (("foo3_beleg"."id" IN (SELECT beleg_id FROM foo3_text where 
foo3_bar_p(# content @@ 'footown'::tsquery)) AND "foo3_beleg"."belegart_id" IN 
foo3_bar_p(# ('...', ...));
                                                                                             QUERY PLAN                                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=75314.58..121963.20 rows=152 width=135) (actual time=27253.451..88462.165 rows=11 loops=1)
   ->  HashAggregate  (cost=75314.58..75366.87 rows=5229 width=4) (actual time=16087.345..16113.988 rows=17671 loops=1)
         ->  Bitmap Heap Scan on foo3_text  (cost=273.72..75254.67 rows=23964 width=4) (actual time=327.653..16026.787 rows=27405 loops=1)
               Recheck Cond: (content @@ '''footown'''::tsquery)
               ->  Bitmap Index Scan on foo3_text_content_idx  (cost=0.00..267.73 rows=23964 width=0) (actual time=281.909..281.909 rows=27405 loops=1)
                     Index Cond: (content @@ '''footown'''::tsquery)
   ->  Index Scan using foo3_beleg_pkey on foo3_beleg  (cost=0.00..8.90 rows=1 width=135) (actual time=4.092..4.092 rows=0 loops=17671)
         Index Cond: (id = foo3_text.beleg_id)
         Filter: ((belegart_id)::text = ANY ('{...
         Rows Removed by Filter: 1
 Total runtime: 88462.809 ms
(11 rows)

Second call (fast)

  Nested Loop  (cost=75314.58..121963.20 rows=152 width=135) (actual time=127.569..348.705 rows=11 loops=1)
   ->  HashAggregate  (cost=75314.58..75366.87 rows=5229 width=4) (actual time=114.390..133.131 rows=17671 loops=1)
         ->  Bitmap Heap Scan on foo3_text  (cost=273.72..75254.67 rows=23964 width=4) (actual time=11.961..97.943 rows=27405 loops=1)
               Recheck Cond: (content @@ '''footown'''::tsquery)
               ->  Bitmap Index Scan on foo3_text_content_idx  (cost=0.00..267.73 rows=23964 width=0) (actual time=9.226..9.226 rows=27405 loops=1)
                     Index Cond: (content @@ '''footown'''::tsquery)
   ->  Index Scan using foo3_beleg_pkey on foo3_beleg  (cost=0.00..8.90 rows=1 width=135) (actual time=0.012..0.012 rows=0 loops=17671)
         Index Cond: (id = foo3_text.beleg_id)
         Filter: ((belegart_id)::text = ANY ('...
         Rows Removed by Filter: 1
 Total runtime: 348.833 ms
(11 rows)

Table layout of the foo3_text table (28M rows)

foo3_egs_p=# \d foo3_text
                                 Table "public.foo3_text"
  Column  |         Type          |                         Modifiers                          
----------+-----------------------+------------------------------------------------------------
 id       | integer               | not null default nextval('foo3_text_id_seq'::regclass)
 beleg_id | integer               | not null
 index_id | character varying(32) | not null
 value    | text                  | not null
 content  | tsvector              | 
Indexes:
    "foo3_text_pkey" PRIMARY KEY, btree (id)
    "foo3_text_index_id_2685e3637668d5e7_uniq" UNIQUE CONSTRAINT, btree (index_id, beleg_id)
    "foo3_text_beleg_id" btree (beleg_id)
    "foo3_text_content_idx" gin (content)
    "foo3_text_index_id" btree (index_id)
    "foo3_text_index_id_like" btree (index_id varchar_pattern_ops)
Foreign-key constraints:
    "beleg_id_refs_id_6e6d40770e71292" FOREIGN KEY (beleg_id) REFERENCES foo3_beleg(id) DEFERRABLE INITIALLY DEFERRED
    "index_id_refs_name_341600137465c2f9" FOREIGN KEY (index_id) REFERENCES foo3_index(name) DEFERRABLE INITIALLY DEFERRED

Hardware changes (SSD instead of traditional disks) or RAM disks are possible. But maybe there the current hardware can do faster results, too.

Version: PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu

Please leave a comment if you need more details.

解决方案

Postgres is providing you a chance to do some configuration on runtime query executing for deciding your I/O operation priority.

random_page_cost(floating point) -(reference) is what may help you. It will basically set your IO/CPU operation ratio.

Higher value means I/O is important, I have sequential disk; and lower value means I/O is not important, I have random-access disk.

Default value is 4.0, and may be you want to increase and test if your query take shorter time.

Do not forget, your I/O priority will depend on your column count, row count.

A big BUT; since your indicies are btree, your CPU priority is going down much faster than I/O priorities going up. You can basically map complexities to priorities.

CPU Priority = O(log(x))
I/O Priority = O(x)

All in all, this means, if Postgre's value 4.0 would for 100k entries, You should set it to (approx.) (4.0 * log(100k) * 10M)/(log(10M) * 100k) for 10M entry.

这篇关于提高首次查询的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆