提高查询速度:大postgres表中的简单SELECT [英] Improving query speed: simple SELECT in big postgres table

查看:537
本文介绍了提高查询速度:大postgres表中的简单SELECT的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Postgres数据库上的SELECT查询中遇到速度问题。

I'm having trouble regarding speed in a SELECT query on a Postgres database.

我有一个包含两个整数列作为键的表:(int1,int2)
该表大约有7,000万行。

I have a table with two integer columns as key: (int1,int2) This table has around 70 million rows.

在这种环境下,我需要进行两种简单的SELECT查询:

I need to make two kind of simple SELECT queries in this environment:

SELECT * FROM table WHERE int1=X;
SELECT * FROM table WHERE int2=X;

这两个选择从这7000万行中分别返回大约10.000行。为了尽可能快地工作,我考虑使用两个HASH索引,每列一个。不幸的是结果不是很好:

These two selects returns around 10.000 rows each out of these 70 million. For this to work as fast as possible I thought on using two HASH indexes, one for each column. Unfortunately the results are not that good:

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on lec_sim  (cost=232.21..25054.38 rows=6565 width=36) (actual time=14.759..23339.545 rows=7871 loops=1)
   Recheck Cond: (lec2_id = 11782)
   ->  Bitmap Index Scan on lec_sim_lec2_hash_ind  (cost=0.00..230.56 rows=6565 width=0) (actual time=13.495..13.495 rows=7871 loops=1)
         Index Cond: (lec2_id = 11782)
 Total runtime: 23342.534 ms
(5 rows)

这是一个EXPLAIN ANALYZE示例的示例这些查询。这大约需要23秒。我的期望是在不到一秒钟的时间内获得此信息。

This is an EXPLAIN ANALYZE example of one of these queries. It is taking around 23 seconds. My expectations are to get this information in less than a second.

这些是postgres db配置的一些参数:

These are some parameters of the postgres db config:

work_mem = 128MB
shared_buffers = 2GB
maintenance_work_mem = 512MB
fsync = off
synchronous_commit = off
effective_cache_size = 4GB

任何帮助,评论或想法都将不胜感激。

Any help, comment or thought would be really appreciated.

预先谢谢您。

推荐答案

将我的评论提取为答案:这里的索引查找非常快- -所有时间都花在检索实际行上。 23秒/ 7871行=每行2.9毫秒,这对于检索分散在磁盘子系统中的数据是合理的。搜寻很慢;您可以a)将数据集放入RAM,b)购买SSD或c)提前整理数据以最大程度地减少寻道。

Extracting my comments into an answer: the index lookup here was very fast -- all the time was spent retrieving the actual rows. 23 seconds / 7871 rows = 2.9 milliseconds per row, which is reasonable for retrieving data that's scattered across the disk subsystem. Seeks are slow; you can a) fit your dataset in RAM, b) buy SSDs, or c) organize your data ahead of time to minimize seeks.

PostgreSQL 9.2具有称为<一个href = http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.2#Index-only_scans>仅索引扫描,它可以(通常)在不访问表的情况下回答查询。您可以将其与自动维护订单的 btree 索引属性结合使用,以使查询速度更快。您提到 int1 int2 和两个浮点数:

PostgreSQL 9.2 has a feature called index-only scans that allows it to (usually) answer queries without accessing the table. You can combine this with the btree index property of automatically maintaining order to make this query fast. You mention int1, int2, and two floats:

CREATE INDEX sometable_int1_floats_key ON sometable (int1, float1, float2);
CREATE INDEX sometable_int2_floats_key ON sometable (int2, float1, float2);

SELECT float1,float2 FROM sometable WHERE int1=<value>; -- uses int1 index
SELECT float1,float2 FROM sometable WHERE int2=<value>; -- uses int2 index

请注意,这并不能神奇地擦除磁盘搜索,它只是移动它们从查询时间到插入时间。由于您要复制数据,因此也会浪费您的存储空间。不过,这可能是您想要的权衡。

Note also that this doesn't magically erase the disk seeks, it just moves them from query time to insert time. It also costs you storage space, since you're duplicating the data. Still, this is probably the trade-off you want.

这篇关于提高查询速度:大postgres表中的简单SELECT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆