PostgreSQL的count查询优化 [英] Optimization of count query for PostgreSQL

查看:171
本文介绍了PostgreSQL的count查询优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 postgresql 中有一个表,其中包含一个不断更新的数组.

I have a table in postgresql that contains an array which is updated constantly.

在我的应用程序中,我需要获取该数组列中不存在特定参数的行数.我的查询如下所示:

In my application i need to get the number of rows for which a specific parameter is not present in that array column. My query looks like this:

select count(id) 
from table 
where not (ARRAY['parameter value'] <@ table.array_column)

但是当增加该查询的行数和执行次数(每秒几次,可能是数百或数千)时,性能会下降很多,在我看来,postgresql 中的计数可能具有线性顺序执行(我不完全确定).

But when increasing the amount of rows and the amount of executions of that query (several times per second, possibly hundreds or thousands) the performance decreses a lot, it seems to me that the counting in postgresql might have a linear order of execution (I’m not completely sure of this).

基本上我的问题是:

是否有我不知道的适用于这种情况的现有模式?什么是最好的方法?

Is there an existing pattern I’m not aware of that applies to this situation? what would be the best approach for this?

您能给我的任何建议将不胜感激.

Any suggestion you could give me would be really appreciated.

推荐答案

是否有我不知道的现有模式适用于此情况?什么是最好的方法?

Is there an existing pattern I’m not aware of that applies to this situation? what would be the best approach for this?

在这种情况下,您最好的选择可能是规范化您的架构.将数组拆分成一个表.在属性表上添加 B 树索引,或对主键进行排序,以便可以通过 property_id 有效地搜索它.

Your best bet in this situation might be to normalize your schema. Split the array out into a table. Add a b-tree index on the table of properties, or order the primary key so it's efficiently searchable by property_id.

CREATE TABLE demo( id integer primary key );
INSERT INTO demo (id) SELECT id FROM arrtable;
CREATE TABLE properties (
  demo_id integer not null references demo(id),
  property integer not null,
  primary key (demo_id, property)
);
CREATE INDEX properties_property_idx ON properties(property);

然后您可以查询属性:

SELECT count(id) 
FROM demo 
WHERE NOT EXISTS (
  SELECT 1 FROM properties WHERE demo.id = properties.demo_id AND property = 1
)

我预计这会比原始查询快很多,但实际上对于相同的样本数据来说是一样的;它在与原始查询相同的 2s 到 3s 范围内运行.同样的问题是,搜索不是的东西比搜索的东西要慢得多;如果我们要查找包含属性的行,我们可以避免 demo 的 seqscan,而只需扫描 properties 以直接匹配 ID.

I expected this to be a lot faster than the original query, but it's actually much the same with the same sample data; it runs in the same 2s to 3s range as your original query. It's the same issue where searching for what is not there is much slower than searching for what is there; if we're looking for rows containing a property we can avoid the seqscan of demo and just scan properties for matching IDs directly.

同样,对包含数组的表进行 seq 扫描也能完成这项工作.

Again, a seq scan on the array-containing table does the job just as well.

这篇关于PostgreSQL的count查询优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆