Postgres-这是在布尔列上创建部分索引的正确方法吗? [英] Postgres - Is this the right way to create a partial index on a boolean column?

查看:78
本文介绍了Postgres-这是在布尔列上创建部分索引的正确方法吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下表:

 创建表配方元数据

-很多列
Diet_glutenfree布尔型NOT NULL,
);

大多数每一行都将设置为 FALSE 除非有人提出一种疯狂的新无麸质饮食来席卷全国。



我需要能够非常快速地查询该值为真的行。我创建了索引:

 创建索引IDX_RecipeMetadata_GlutenFree ON RecipeMetadata(diet_glutenfree)在哪里Diet_glutenfree; 

它似乎可以工作,但是我不知道如何确定它是否仅是索引行值是真实的。我想确保它不会像索引所有具有任何值的行那样愚蠢。



我应该在 WHERE 子句,还是此语法完全有效?希望这不是那些将被否决30次的超级简单RTFM问题之一。



更新:



我继续进行,并向RecipeMetadata添加了10,000行具有随机值的行。然后我在桌子上做了一个分析,然后做了一个REINDEX来确定。当我运行查询时:



从RecipeMetadata中选择食谱ID,其中Diet_glutenfree;



我得到:

 '在配方元数据上进行序列扫描(成本= 0.00..214.26行= 5010宽度= 16)'
'过滤器:Diet_glutenfree'

因此,它似乎是按顺序进行的即使只有大约一半的行具有此标志,也可以在表上进行扫描。



如果我这样做:



从RecipeMetadata选择菜谱编号哪里不是Diet_glutenfree;



我得到:

 在配方元数据上进行序列扫描(成本= 0.00..214.26行= 5016宽度= 16)'
'过滤器:(NOT Diet_glutenfree)'

所以无论如何,都不会使用该索引。

解决方案

我已经确认索引可以正常工作。



我重新创建了随机数据,只是这次设置了 diet_glutenfree random()> 0.9 ,那么在上出现的机会只有10%。



重新创建索引并再次尝试查询。

 从RecipeMetadata中选择RecipeId,其中Diet_glutenfree; 

返回值:

 '对配方元数据使用idx_recipemetadata_glutenfree进行索引扫描(cost = 0.00..135.15行= 1030宽度= 16)'
'索引条件:(diet_glutenfree = true)'

并且:

 从RecipeMetadata中选择RecipeId没有Diet_glutenfree; 

返回值:

 '对配方元数据进行序列扫描(cost = 0.00..214.26行= 8996宽度= 16)'
'过滤器:(NOT Diet_glutenfree)'

似乎我的第一次尝试受到了污染,因为PG估计,如果无论如何它必须装载一半以上的行,则扫描整个表比击中索引要快。 / p>

但是,我想我会在列的完整索引上得到这些确切的结果。有没有一种方法可以验证部分索引中索引的行数?



UPDATE



索引约为40k。我创建了同一列的完整索引,它的索引超过20万,因此看起来绝对是部分索引。


I have the following table:

CREATE TABLE recipemetadata
(
  --Lots of columns
  diet_glutenfree boolean NOT NULL,
);

Most every row will be set to FALSE unless someone comes up with some crazy new gluten free diet that sweeps the country.

I need to be able to very quickly query for rows where this value is true. I've created the index:

CREATE INDEX IDX_RecipeMetadata_GlutenFree ON RecipeMetadata(diet_glutenfree) WHERE diet_glutenfree;

It appears to work, however I can't figure out how to tell if indeed it's only indexing rows where the value is true. I want to make sure it's not doing something silly like indexing any rows with any value at all.

Should I add an operator to the WHERE clause, or is this syntax perfectly valid? Hopefully this isn't one of those super easy RTFM questions that will get downvoted 30 times.

UPDATE:

I've gone ahead and added 10,000 rows to RecipeMetadata with random values. I then did an ANALYZE on the table and a REINDEX just to be sure. When I run the query:

select recipeid from RecipeMetadata where diet_glutenfree;

I get:

'Seq Scan on recipemetadata  (cost=0.00..214.26 rows=5010 width=16)'
'  Filter: diet_glutenfree'

So, it appears to be doing a sequential scan on the table even though only about half the rows have this flag. The index is being ignored.

If I do:

select recipeid from RecipeMetadata where not diet_glutenfree;

I get:

'Seq Scan on recipemetadata  (cost=0.00..214.26 rows=5016 width=16)'
'  Filter: (NOT diet_glutenfree)'

So no matter what, this index is not being used.

解决方案

I've confirmed the index works as expected.

I re-created the random data, only this time set diet_glutenfree to random() > 0.9 so there's only a 10% chance of an on bit.

I then re-created the indexes and tried the query again.

SELECT RecipeId from RecipeMetadata where diet_glutenfree;

Returns:

'Index Scan using idx_recipemetadata_glutenfree on recipemetadata  (cost=0.00..135.15 rows=1030 width=16)'
'  Index Cond: (diet_glutenfree = true)'

And:

SELECT RecipeId from RecipeMetadata where NOT diet_glutenfree;

Returns:

'Seq Scan on recipemetadata  (cost=0.00..214.26 rows=8996 width=16)'
'  Filter: (NOT diet_glutenfree)'

It seems my first attempt was polluted since PG estimates it's faster to scan the whole table rather than hit the index if it has to load over half the rows anyway.

However, I think I would get these exact results on a full index of the column. Is there a way to verify the number of rows indexed in a partial index?

UPDATE

The index is around 40k. I created a full index of the same column and it's over 200k, so it looks like it's definitely partial.

这篇关于Postgres-这是在布尔列上创建部分索引的正确方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆