为什么这个Postgis距离查询这么慢? Postgres的查询估算器偏离10000x? [英] Why is this Postgis distance query so slow? Postgres' query estimator off by a factor of 10000x?

查看:342
本文介绍了为什么这个Postgis距离查询这么慢? Postgres的查询估算器偏离10000x?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试查找一定距离内的所有帖子,但是不幸的是,对于某些输入,查询速度非常慢.这是一些示例:

I'm trying to find all posts that were within a certain distance, but unfortunately for some inputs the query is extremely slow. Here's some examples:

-- fast (1 millisecond)
SELECT 1
FROM post po
WHERE ST_DWithin(po.geog, ST_SetSRID(ST_MakePoint(-47, -70), 4326)::geography, 4500 * 1609.34)
LIMIT 10;

-- slow (2 seconds)
SELECT 1
FROM post po
WHERE (po.geog <-> ST_SetSRID(ST_MakePoint(-47, -70), 4326)::geography) < 4500 * 1609.34
LIMIT 10;

-- slow (9 seconds)
SELECT 1
FROM post po
WHERE ST_DWithin(po.geog, ST_SetSRID(ST_MakePoint(-70, 40), 4326)::geography, 4500 * 1609.34)
ORDER BY po.reply_count DESC, convo_id DESC
LIMIT 10;

-- fast (1 millisecond)
SELECT 1
FROM post po
WHERE (po.geog <-> ST_SetSRID(ST_MakePoint(-70, 40), 4326)::geography) < 4500 * 1609.34
ORDER BY po.reply_count DESC, convo_id DESC
LIMIT 10;

以下是EXPLAIN ANALYZE的可视化,显示了耗时9秒的第三个查询: https: //explain.depesz.com/s/Xd6d

Here is the visualization of the EXPLAIN ANALYZE for the third query that is taking 9 seconds: https://explain.depesz.com/s/Xd6d

这是第四个查询的EXPLAIN ANALYZE: https://explain.depesz.com/s/zcKa

基本上,取决于输入,使用<->的非索引距离似乎有时更快,然后对于其他输入,索引距离运算符(ST_DWithin)更快.

Basically, depending on the inputs, it seems like the non-indexed distance using the <-> is sometimes faster, and then for other inputs, the indexed distance operator (ST_DWithin) is faster.

我认为ST_DWithin 应该基本上总是更快(或至少在合理的时间内完成),但是由于某种原因,在这种情况下,它具有令人难以置信的运行时间.有谁知道为什么查询计划器这么差?根据说明输出,Postgres认为将有100行,但实际上有1,000,000行.

I think that ST_DWithin should basically always be faster (or at least complete in a reasonable amount of time), but for some reason in this case it is having an incredible runtime. Does anyone know why the query planner is so off? Based on the explain output, it looks like Postgres thinks there's going to be 100 rows, but there are actually 1,000,000 rows.

以下是我拥有的相关索引:

Here are the relevant indexes that I have:

CREATE UNIQUE INDEX post_pk ON public.post USING btree (convo_id)
CREATE INDEX post_geog_spidx ON public.post USING spgist (geog)
CREATE INDEX post_reply_count_convo_id_idx ON public.post USING btree (reply_count, convo_id)
CREATE INDEX post_reply_count_idx ON public.post USING btree (reply_count)

geog中使用gist代替spgist不会影响运行时间.

Using a gist instead of a spgist for geog did not affect the runtime.

我所有的地理位置都是点,并且我已经运行了VACUUM (ANALYSE, VERBOSE);

All my geographies are points and I have run VACUUM (ANALYSE, VERBOSE); already

我的版本号是:

PostgreSQL 12.0, compiled by Visual C++ build 1914, 64-bit

POSTGIS="3.0.0 r17983" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 " PROJ="Rel. 5.2.0, September 15th, 2018" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.4.3 (Internal)" TOPOLOGY

推荐答案

估计这些类型的多维操作将返回的行数非常困难.它将需要PostgreSQL无法收集的统计信息类型.

Estimating the number of rows which will be returned by these types of multidimensional operations is very hard. It would require types of statistics which PostgreSQL doesn't gather.

因此,主要由您来确定有多少行符合条件(在LIMIT之前),并适当地构建查询.您的ST_DWithin总是那么不受限制吗?您会提前知道它何时会变得如此不受限制吗?

So it is mostly upon you to figure out how many rows will qualify (before the LIMIT), and craft your query appropriately. Is your ST_DWithin always so unrestrictive? Do you know ahead of time when it is going to be so unrestrictive?

我不是GIS专家,但是我认为SRID 4326的单位是度,所以几乎所有其他东西都将在(4500 * 1609.34)度之内,不是吗? /s>

I'm not a GIS expert, but I think the units for SRID 4326 is degrees, so pretty much everything is going to be within (4500 * 1609.34) degrees of everything else, isn't it?

这篇关于为什么这个Postgis距离查询这么慢? Postgres的查询估算器偏离10000x?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆