PostgreSQL Citext索引与较低的表达式索引性能 [英] PostgreSQL citext index vs lower expression index performance
问题描述
我想决定使用带有索引的 citext
列还是使用带有索引的 text
列 lower()
。
I want to decide between using a citext
column with an index or using a text
column with an index on lower()
.
我执行了一些基准测试。令我惊讶的是,使用 lower()
上的索引进行搜索会导致索引扫描,但是在 citext
的情况下,仅索引扫描。我期望 lower()
上的索引也导致索引也只扫描。
I performed some benchmarks. To my surprise the search with the index on lower()
causes an index scan, but in the citext
case I get an index only scan. I was expecting the index on lower()
to cause an index only scan too.
另外,总成本索引为 citext
的索引为4.44,而索引为 lower()
的索引的总费用为8.44。
Also, the total cost with the citext
index is 4.44, but with the index on lower()
, the total cost is 8.44.
因此,我首先想到的是 citext
列索引比 text
列。
So the first thing coming to my mind is that the citext
column index is better than the expression index on a text
column.
CREATE TABLE test_citext(a citext NOT NULL);
INSERT INTO test_citext
SELECT cast(x as text)
FROM generate_series(1, 1000000) AS x;
VACUUM (FREEZE, ANALYZE) test_citext;
create index citextind on test_citext(a);
Select * from test_citext where a = 'test';
--Index Only Scan.Total cost 4.44
CREATE TABLE test_textlowerindex(a text NOT NULL);
INSERT INTO test_textlowerindex
SELECT cast(x as text)
FROM generate_series(1, 1000000) AS x;
VACUUM (FREEZE, ANALYZE) test_textlowerindex;
create index lowertextind on test_textlowerindex(lower(a));
Select * from test_textlowerindex where lower(a) = 'test';
--Index Scan.Total cost 8.44
我对吗?
Laurenz先生先生,谢谢您的回答。我如您所说更改了我的上述脚本。
结果:
Mister Laurenz Albe thanks for your answer.I changed my above script as you said. Result :
CREATE TABLE test_citext(a citext NOT NULL);
INSERT INTO test_citext
SELECT cast(x as text)
FROM generate_series(1, 1000000) AS x;
create index citextind on test_citext(a);
VACUUM (FREEZE, ANALYZE) test_citext;
Select count(*) from test_citext where a = 'test';
--Index Only Scan 4.44 + 4.46
CREATE TABLE test_textlowerindex(a text NOT NULL);
INSERT INTO test_textlowerindex
SELECT cast(x as text)
FROM generate_series(1, 1000000) AS x;
create index lowertextind on test_textlowerindex(lower(a));
VACUUM (FREEZE, ANALYZE) test_textlowerindex;
Select count(*) from test_textlowerindex where lower(a) = 'test';
--Index Scan 8.44 + 8.46
但是并没有改变,即使我在创建索引并在select中使用count(*)之后运行分析。IndexScan仍会继续在Lower()上进行索引。
But anything did not changed Even If I run analyze after creating index and using count(*) in select.Index Scan still continues with an index on lower().
推荐答案
您的测试具有误导性。这里有两个问题:
Your test is misleading. There are two problems here:
-
您没有运行
ANALYZE
在创建索引lowertextind
之后。
否则,PostgreSQL不知道如何降低 lower( a)
是分布式的,可能会产生错误的成本估算。
Without that, PostgreSQL doesn't know how lower(a)
is distributed and will probably produce a wrong cost estimate.
通过使用 SELECT *
您无意中允许将仅索引扫描用于第一个查询,但不能用于第二个查询。这是因为第一个索引包含所有表列,但第二个索引不包含。
By using SELECT *
you inadvertently allowed an index only scan to be used for the first query, but not for the second. This is because the first index contains all table columns, but the second doesn't.
因为第二个索引不包含 a
,必须从表中获取该值,这会导致其他工作。
Since the second index doesn't contain a
, the value has to be fetched from the table, causing additional work.
您可以使用 SELECT count(*)FROM ...
以获得更公平的基准。
You could use SELECT count(*) FROM ...
for a fairer benchmark.
这篇关于PostgreSQL Citext索引与较低的表达式索引性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!