在PostgreSQL中索引外键 [英] Indexing Foreign Keys in Postgresql

查看:80
本文介绍了在PostgreSQL中索引外键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

就像许多Postgres n00bs 一样,我们有很多带有外键约束的表未编入索引.在某些情况下,这应该不会对性能造成重大影响-但这有待进一步分析.

Like many Postgres n00bs we have a lot of tables with foreign key constraints that are not indexed. I some cases this should not be a big performance hit - but this would be subject for further analysis.

我已阅读以下文章: https://www.cybertec-postgresql.com/en/index-your-foreign-key/

并使用以下查询查找没有索引的所有外键:

And used the following query to find all foreign keys without an index:

SELECT c.conrelid::regclass AS "table",
       /* list of key column names in order */
       string_agg(a.attname, ',' ORDER BY x.n) AS columns,
       pg_catalog.pg_size_pretty(
          pg_catalog.pg_relation_size(c.conrelid)
       ) AS size,
       c.conname AS constraint,
       c.confrelid::regclass AS referenced_table
FROM pg_catalog.pg_constraint c
   /* enumerated key column numbers per foreign key */
   CROSS JOIN LATERAL
      unnest(c.conkey) WITH ORDINALITY AS x(attnum, n)
   /* name for each key column */
   JOIN pg_catalog.pg_attribute a
      ON a.attnum = x.attnum
         AND a.attrelid = c.conrelid
WHERE NOT EXISTS
        /* is there a matching index for the constraint? */
        (SELECT 1 FROM pg_catalog.pg_index i
         WHERE i.indrelid = c.conrelid
           /* the first index columns must be the same as the
              key columns, but order doesn't matter */
           AND (i.indkey::smallint[])[0:cardinality(c.conkey)-1]
               @> c.conkey::int[])
  AND c.contype = 'f'
GROUP BY c.conrelid, c.conname, c.confrelid
ORDER BY pg_catalog.pg_relation_size(c.conrelid) DESC;

这向我显示了具有复合唯一约束的表,该表仅在唯一索引中的列中一个":

This shows me for tables with composite unique constraints only "one" of the columns in the unique index:

\d topics_items;
-----------------+---------+--------------+---------------+------------------------------
 topics_items_id | integer |              | not null      | generated always as identity
 topic_id        | integer |              | not null      |
 item_id         | integer |              | not null      |
Index:
    "topics_items_pkey" PRIMARY KEY, btree (topics_items_id)
    "topic_id_item_id_unique" UNIQUE CONSTRAINT, btree (topic_id, item_id)
Foreign Keys:
    "topics_items_item_id_fkey" FOREIGN KEY (item_id) REFERENCES items(item_id) ON DELETE CASCADE
    "topics_items_topic_id_fkey" FOREIGN KEY (topic_id) REFERENCES topics(topic_id) ON DELETE CASCADE

在这种情况下,检查查询仅将 item_id 而不是 topic_id 查找为未索引字段.

In this case the check query only finds the item_id and not the topic_id as an not indexed field.

可以公平地说,这只是所使用查询的一个问题,我必须分别对两个字段(topic_id和item_id)进行索引-还是涉及一些黑色巫术,而仅涉及 item_id 需要索引吗?

Is it fair to say, that this is just an issue of the query used, and I have to separately index both fields (topic_id and item_id) - or is there some black sorcery involved and only the item_id needs an index?

推荐答案

tl; dr 您需要在 item_id 上添加索引. 11中介绍了Postgres索引编制的黑魔法".索引.

您在(topic_id,item_id)上有一个复合索引,列的顺序很重要.Postgres可以使用它为 topic_id 上的查询建立索引,对 topic_id item_id 上的查询建立索引,但不能(或效率较低) item_id .

You have a composite index on (topic_id, item_id) and column order is important. Postgres can use this to index queries on topic_id, queries on both topic_id and item_id, but not (or less efficiently) item_id alone.

来自 11.3.多列索引 ...

多列B树索引可用于涉及该索引列的任何子集的查询条件,但是当前(最左边)列有约束时,该索引效率最高.

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.

-- indexed
select *
from topics_items
where topic_id = ?

-- also indexed
select *
from topics_items
where topic_id = ?
  and item_id = ?

-- probably not indexed
select *
from topics_items
where item_id = ?

这是因为像(topic_id,item_id)这样的复合索引会先存储主题ID,然后存储同样具有该主题ID的项目ID.为了有效地在此索引中查找商品ID,Postgres必须首先使用主题ID缩小搜索范围.

This is because a composite index like (topic_id, item_id) stores the topic ID first, then an item IDs which also have that topic ID. In order to look up an item ID efficiently in this index, Postgres must first narrow the search with a topic ID.

Postgres 认为值得这样做,则可以将其反转.如果可能的主题ID数量很少,而可能的索引ID数量很多,则会在每个主题ID中搜索索引ID.

Postgres can reverse an index if it thinks it's worth the effort. If there's a small number of possible topic IDs, and a large number of possible index IDs, it will search for the index ID in each topic ID.

例如,假设您有10个可能的主题ID和1000个可能的商品ID,并且索引为(topic_id,index_id).这就像有10个带有清晰标签的主题ID桶,每个桶中有1000个带有清晰标签的商品ID桶.要进入商品ID桶,它必须在每个主题ID桶内查看.要在 where item_id = 23 上使用此索引,Postgres必须在10个主题ID桶中搜索每个ID为23的桶.

For example, let's say you have 10 possible topic IDs and 1000 possible item IDs and your index (topic_id, index_id). This is like having 10 clearly labelled topic ID buckets each with 1000 clearly labelled item ID buckets inside. To get to the item ID buckets, it must look inside each topic ID bucket. To use this index on where item_id = 23 Postgres must search each of the 10 topic ID buckets for all the buckets with item ID 23.

但是,如果您有1000个可能的主题ID和10个可能的商品ID,则Postgres将不得不搜索1000个主题ID存储桶.很有可能它将进行全表扫描.在这种情况下,您需要反转索引并将其设置为(item_id,topic_id).

But if you have 1000 possible topic IDs and 10 possible item IDs, Postgres would have to search 1000 topic IDs buckets. Most likely it will do a full table scan instead. In this case you'd want to reverse your index and make it (item_id, topic_id).

这在很大程度上取决于是否具有良好的表统计信息,这意味着确保自动清理工作正常.

This depends heavily on having good table statistics, which means making sure autovacuum is working properly.

因此,如果一列的可变性远低于另一列,那么您就可以摆脱针对两列的单个索引.

So you can get away with a single index for two columns, if one column has far less variability than another.

Postgres也可以使用多索引,如果它认为可以查询运行速度更快.例如,如果您在 topic_id 上有一个索引,在 item_id 上有一个索引,则它可以使用这两个索引并将结果合并.例如,其中topic_id = 23或item_id = 42 可以使用topic_id索引搜索主题ID 23,并使用item_id索引搜索项目ID 42,然后将结果合并.

Postgres can also use mulitple indexes if it thinks it will make the query run faster. For example, if you had an index on topic_id and an index on item_id, it can use both indexes and combine the results. For example where topic_id = 23 or item_id = 42 could use the topic_id index to search for topic ID 23, and the item_id index to search for item ID 42, then combine the results.

通常比具有复合(topic_id,item_id)索引要慢.它也可能比使用单个索引慢,因此如果Postgres决定不使用多个索引也不要感到惊讶.

This is generally slower than having a composite (topic_id, item_id) index. It can also be slower than using a single index, so don't be surprised if Postgres decides not to use multiple indexes.

通常,对于b树索引,当您有两列时,将具有三种可能的组合.

In general, for b-tree indexes, when you have two columns you have three possible combinations.

  • a + b
  • a
  • b

您需要两个索引.

  • (a,b)-a和a + b
  • (b)-b

(a,b)涵盖了对a和a + b的搜索.(b)涵盖了搜索 b 的情况.

(a, b) covers both searches for a and a + b. (b) covers searching for b.

当您拥有三列时,就有七个可能的组合.

When you have three columns, you have seven possible combinations.

  • a + b + c
  • a + b
  • a + c
  • a
  • b + c
  • b
  • c

但是您只需要三个索引.

But you only need three indexes.

  • (a,b,c)-a,a + b,a + b + c
  • (b,c)-b,b + c
  • (c,a)-c,c + a

但是,您实际上可能希望避免在三列上建立索引.通常慢些.你真正想要的是这个.

However, you probably actually want to avoid having an index on three columns. It's often slower. What you actually want is this.

  • (a,b)
  • (b,c)
  • (c,a)

应谨慎使用多列索引.在大多数情况下,单个列上的索引就足够了,并且可以节省空间和时间.除非表的使用风格极高,否则具有三列以上的索引不太可能有帮助.

Multicolumn indexes should be used sparingly. In most situations, an index on a single column is sufficient and saves space and time. Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized.

从索引中读取要比从表中读取慢.您希望索引减少必须读取的行数,但是您不希望Postgres进行不必要的索引扫描.

Reading from an index is slower than reading from the table. You want your indexes to reduce the number of rows which must be read, but you don't want Postgres to have to do any more index scanning than necessary.

在索引中检查了右列上的约束...,因此它们可以适当地保存对表的访问,但不会减少索引中必须扫描的部分.例如,给定(a,b,c)的索引和查询条件,其中a = 5 AND b> = 42 AND c<.77,则必须从a = 5和b = 42的第一个条目到a = 5的最后一个条目扫描索引,而c> = 77的索引条目将被跳过,但仍然必须扫描通过.

Constraints on columns to the right... are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through.

这篇关于在PostgreSQL中索引外键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆