具有多个自我联接的大型表上的空间查询执行缓慢 [英] Spatial query on large table with multiple self joins performing slow

查看:83
本文介绍了具有多个自我联接的大型表上的空间查询执行缓慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Postgres 9.3.9中的大表进行查询.它是一个空间数据集,并在空间上建立索引.说,我需要找到3种类型的对象:A,B和C.条件是B和C都在A的一定距离内,例如500米.

I am working on queries on a large table in Postgres 9.3.9. It is a spatial dataset and it is spatially indexed. Say, I have need to find 3 types of objects: A, B and C. The criteria is that B and C are both within certain distance of A, say 500 meters.

我的查询是这样的:

select 
  school.osm_id as school_osm_id, 
  school.name as school_name, 
  school.way as school_way, 
  restaurant.osm_id as restaurant_osm_id, 
  restaurant.name as restaurant_name, 
  restaurant.way as restaurant_way, 
  bar.osm_id as bar_osm_id, 
  bar.name as bar_name, 
  bar.way as bar_way 
from (
    select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'school') as school, 
   (select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'restaurant') as restaurant, 
   (select osm_id, name, amenity, way, way_geo 
    from planet_osm_point 
    where amenity = 'bar') as bar 
where ST_DWithin(school.way_geo, restaurant.way_geo, 500, false) 
  and ST_DWithin(school.way_geo, bar.way_geo, 500, false);

此查询提供了我想要的内容,但执行时间非常长,例如13秒.我想知道是否还有另一种方法来编写查询并使它更有效.

This query gives me what I want, but it takes really long time, like 13 seconds to execute. I'm wondering if there is another way to write the query and make it more efficient.

查询计划:

Nested Loop  (cost=74.43..28618.65 rows=1 width=177) (actual time=33.513..11235.212 rows=10591 loops=1)
   Buffers: shared hit=530967 read=8733
   ->  Nested Loop  (cost=46.52..28586.46 rows=1 width=174) (actual time=31.998..9595.212 rows=4235 loops=1)
         Buffers: shared hit=389863 read=8707
         ->  Bitmap Heap Scan on planet_osm_point  (cost=18.61..2897.83 rows=798 width=115) (actual time=7.862..150.607 rows=8811 loops=1)
               Recheck Cond: (amenity = 'school'::text)
               Buffers: shared hit=859 read=5204
               ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=5.416..5.416 rows=8811 loops=1)
                     Index Cond: (amenity = 'school'::text)
                     Buffers: shared hit=3 read=24
         ->  Bitmap Heap Scan on planet_osm_point planet_osm_point_1  (cost=27.91..32.18 rows=1 width=115) (actual time=1.064..1.069 rows=0 loops=8811)
               Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'restaurant'::text))
               Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
               Rows Removed by Filter: 0
               Buffers: shared hit=389004 read=3503
               ->  BitmapAnd  (cost=27.91..27.91 rows=1 width=0) (actual time=1.058..1.058 rows=0 loops=8811)
                     Buffers: shared hit=384528 read=2841
                     ->  Bitmap Index Scan on idx_planet_osm_point_waygeo  (cost=0.00..9.05 rows=137 width=0) (actual time=0.193..0.193 rows=64 loops=8811)
                           Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
                           Buffers: shared hit=146631 read=2841
                     ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=0.843..0.843 rows=6291 loops=8811)
                           Index Cond: (amenity = 'restaurant'::text)
                           Buffers: shared hit=237897
   ->  Bitmap Heap Scan on planet_osm_point planet_osm_point_2  (cost=27.91..32.18 rows=1 width=115) (actual time=0.375..0.383 rows=3 loops=4235)
         Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'bar'::text))
         Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
         Rows Removed by Filter: 1
         Buffers: shared hit=141104 read=26
         ->  BitmapAnd  (cost=27.91..27.91 rows=1 width=0) (actual time=0.368..0.368 rows=0 loops=4235)
               Buffers: shared hit=127019
               ->  Bitmap Index Scan on idx_planet_osm_point_waygeo  (cost=0.00..9.05 rows=137 width=0) (actual time=0.252..0.252 rows=363 loops=4235)
                     Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
                     Buffers: shared hit=101609
               ->  Bitmap Index Scan on idx_planet_osm_point_amenity  (cost=0.00..18.41 rows=798 width=0) (actual time=0.104..0.104 rows=779 loops=4235)
                     Index Cond: (amenity = 'bar'::text)
                     Buffers: shared hit=25410
 Total runtime: 11238.605 ms

目前我只使用一个表,其中有 1,372,711行.它具有 73列:

I'm only using one table at the moment with 1,372,711 rows. It has 73 columns:

       Column       |         Type         |       Modifiers
--------------------+----------------------+---------------------------
 osm_id             | bigint               | 
 access             | text                 | 
 addr:housename     | text                 | 
 addr:housenumber   | text                 | 
 addr:interpolation | text                 | 
 admin_level        | text                 | 
 aerialway          | text                 | 
 aeroway            | text                 | 
 amenity            | text                 | 
 area               | text                 | 
 barrier            | text                 | 
 bicycle            | text                 | 
 brand              | text                 | 
 bridge             | text                 | 
 boundary           | text                 | 
 building           | text                 | 
 capital            | text                 | 
 construction       | text                 | 
 covered            | text                 | 
 culvert            | text                 | 
 cutting            | text                 | 
 denomination       | text                 | 
 disused            | text                 | 
 ele                | text                 | 
 embankment         | text                 | 
 foot               | text                 | 
 generator:source   | text                 | 
 harbour            | text                 | 
 highway            | text                 | 
 historic           | text                 | 
 horse              | text                 | 
 intermittent       | text                 | 
 junction           | text                 | 
 landuse            | text                 | 
 layer              | text                 | 
 leisure            | text                 | 
 lock               | text                 | 
 man_made           | text                 | 
 military           | text                 | 
 motorcar           | text                 | 
 name               | text                 | 
 natural            | text                 | 
 office             | text                 | 
 oneway             | text                 | 
 operator           | text                 | 
 place              | text                 | 
 poi                | text                 | 
 population         | text                 | 
 power              | text                 | 
 power_source       | text                 | 
 public_transport   | text                 | 
 railway            | text                 | 
 ref                | text                 | 
 religion           | text                 | 
 route              | text                 | 
 service            | text                 | 
 shop               | text                 | 
 sport              | text                 | 
 surface            | text                 | 
 toll               | text                 | 
 tourism            | text                 | 
 tower:type         | text                 | 
 tunnel             | text                 | 
 water              | text                 | 
 waterway           | text                 | 
 wetland            | text                 | 
 width              | text                 | 
 wood               | text                 | 
 z_order            | integer              | 
 tags               | hstore               | 
 way                | geometry(Point,4326) | 
 way_geo            | geography            | 
 gid                | integer              | not null default nextval('...
Indexes:
    "planet_osm_point_pkey1" PRIMARY KEY, btree (gid)
    "idx_planet_osm_point_amenity" btree (amenity)
    "idx_planet_osm_point_waygeo" gist (way_geo)
    "planet_osm_point_index" gist (way)
    "planet_osm_point_pkey" btree (osm_id)

设施学校,餐厅和酒吧分别有8811、6291和779行.

There are 8811, 6291, 779 rows in amenity school, restaurant and bar respectively.

推荐答案

此查询应该走很长一段路(要快得多),

This query should go a long way (be much faster):

WITH school AS (
   SELECT s.osm_id AS school_id, text 'school' AS type, s.osm_id, s.name, s.way_geo
   FROM   planet_osm_point s
        , LATERAL (
      SELECT  1 FROM planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'bar'
      LIMIT   1  -- bar exists -- most selective first if possible
      ) b
        , LATERAL (
      SELECT  1 FROM planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'restaurant'
      LIMIT   1  -- restaurant exists
      ) r
   WHERE  s.amenity = 'school'
   )
SELECT * FROM (
   TABLE school  -- schools

   UNION ALL  -- bars
   SELECT s.school_id, 'bar', x.*
   FROM   school s
        , LATERAL (
      SELECT  osm_id, name, way_geo
      FROM    planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'bar'
      ) x

   UNION ALL  -- restaurants
   SELECT s.school_id, 'rest.', x.*
   FROM   school s
        , LATERAL (
      SELECT  osm_id, name, way_geo
      FROM    planet_osm_point
      WHERE   ST_DWithin(way_geo, s.way_geo, 500, false)
      AND     amenity = 'restaurant'
      ) x
   ) sub
ORDER BY school_id, (type <> 'school'), type, osm_id;

这与您的原始查询 不是 相同,而是您实际想要的,

我想要一所餐厅和酒吧在500以内的学校的清单 米,我需要每所学校的坐标及其对应的坐标 餐厅和酒吧.

I want a list of schools that have restaurants and bars within 500 meters and I need the coordinates of each school and its corresponding restaurants and bars.

因此,此查询返回这些学校的列表,然后返回附近的酒吧和餐馆.学校的osm_id在列school_id中将每行组合在一起.

So this query returns a list of those schools, followed by bars and restaurants nearby. Each set of rows is held together by the osm_id of the school in the column school_id.

现在使用LATERAL连接,以利用空间GiST索引.

Now using LATERAL joins, to make use of the spatial GiST index.

TABLE school只是SELECT * FROM school的简写:

表达式(type <> 'school')首先在每个集合中对学校进行排序,因为:

The expression (type <> 'school') orders the school in each set first, because:

仅需使用此表达式对最后一个SELECT中的子查询sub进行排序. UNION查询将附加的ORDER BY列表限制为仅列,不包含表达式.

The subquery sub in the final SELECT is only needed to order by this expression. A UNION query limits an attached ORDER BY list to only columns, no expressions.

我专注于您针对此答案提出的查询-忽略扩展的要求,以便在其他70个文本列中进行过滤.这确实是设计缺陷.搜索条件应集中在 few 列中.否则,您将必须对所有70列进行索引,而像我将要提出的那样,多列索引几乎是不可行的.仍然可能 ...

I focus on the query you presented for the purpose of this answer - ignoring the extended requirement to filter on any of the other 70 text columns. That's really a design flaw. The search criteria should be concentrated in few columns. Or you'll have to index all 70 columns, and multicolumn indexes like I am going to propose are hardly an option. Still possible though ...

除了现有的:

"idx_planet_osm_point_waygeo" gist (way_geo)

如果始终在同一列上进行过滤,则可以创建

If always filtering on the same column, you could create a multicolumn index covering the few columns you are interested in, so index-only scans become possible:

CREATE INDEX planet_osm_point_bar_idx ON planet_osm_point (amenity, name, osm_id)

Postgres 9.5

即将发布的Postgres 9.5 引入了主要改进,它们恰好可以解决您的问题:

Postgres 9.5

The upcoming Postgres 9.5 introduces major improvements that happen to address your case exactly:

  • 允许查询使用GiST索引对包围盒索引的对象(多边形,圆形)执行准确的距离过滤 (亚历山大·科罗特科夫(Alexander Korotkov),海基·林纳坎加斯(Heikki Linnakangas)

  • Allow queries to perform accurate distance filtering of bounding-box-indexed objects (polygons, circles) using GiST indexes (Alexander Korotkov, Heikki Linnakangas)

以前,需要使用公用表表达式来返回较大的 按边界框距离排序的行数,然后进行过滤 进一步以更准确的非边界框距离计算.

Previously, a common table expression was required to return a large number of rows ordered by bounding-box distance, and then filtered further with a more accurate non-bounding-box distance calculation.

允许GiST索引执行仅索引扫描(Anastasia Lubennikova,Heikki Linnakangas,Andreas Karlsson)

Allow GiST indexes to perform index-only scans (Anastasia Lubennikova, Heikki Linnakangas, Andreas Karlsson)

您特别感兴趣.现在,您可以拥有一个单个多列(覆盖)的GiST索引:

That's of particular interest for you. Now you can have a single multicolumn (covering) GiST index:

CREATE INDEX reservations_range_idx ON reservations
USING gist(amenity, way_geo, name, osm_id)

并且:

  • 改善位图索引扫描性能(Teodor Sigaev,Tom Lane)
  • Improve bitmap index scan performance (Teodor Sigaev, Tom Lane)

并且:

  • 添加GROUP BY分析功能GROUPING SETSCUBEROLLUP(Andrew Gierth,Atri Sharma)
  • Add GROUP BY analysis functions GROUPING SETS, CUBE and ROLLUP (Andrew Gierth, Atri Sharma)

为什么?因为 ROLLUP 将简化我建议的查询.相关答案:

Why? Because ROLLUP would simplify the query I suggested. Related answer:

第一个Alpha版本已于2015年7月2日发布.释放:

The first alpha version has been released on July 2, 2015. The expected timeline for the release:

这是9.5版的Alpha版本,表明有些更改 在发布之前仍然可以使用该功能. PostgreSQL项目 将于8月发布9.5 beta 1,然后定期发布 测试所需的其他beta,直到最终版本发布 2015年末.

This is the alpha release of version 9.5, indicating that some changes to features are still possible before release. The PostgreSQL Project will release 9.5 beta 1 in August, and then periodically release additional betas as required for testing until the final release in late 2015.

基础

当然,请确保不要忽略基础知识:

Basics

Of course, be sure not to overlook the basics:

这篇关于具有多个自我联接的大型表上的空间查询执行缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆