优化PostgreSQL的计数查询 [英] Optimization of count query for PostgreSQL

查看:185
本文介绍了优化PostgreSQL的计数查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在postgresql中有一个表,其中包含一个不断更新的数组。



在我的应用程序中,我需要获取特定参数的行数不存在于该数组列中。我的查询如下:

  select count(id)
从表
其中不是ARRAY [ 'parameter value'] <@ table.array_column)

但是当增加行数和该查询的执行量(每秒几次,可能几百或几千)性能决定了很多,在我看来,在postgresql中的计数可能有一个线性的执行顺序(我不完全确定这一点)。



基本上我的问题是:



有没有我不知道的现有模式适用于此情况?



>解决方案


有没有我不知道的现有模式适用于这个
情况?什么是最好的方法?


在这种情况下最好的办法是使规范化。将数组拆分成表。在属性表上添加b-tree索引,或者订购主键,以便可以通过 property_id 有效地搜索。

  CREATE TABLE demo(id integer primary key); 
INSERT INTO demo(id)SELECT id FROM arrtable;
CREATE TABLE属性(
demo_id integer not null references demo(id),
property integer not null,
primary key(demo_id,property)
);
CREATE INDEX properties_property_idx ON properties(property);

然后可以查询属性:

  SELECT count(id)
FROM demo
WHERE NOT EXISTS(
SELECT 1 FROM properties WHERE demo.id = properties.demo_id AND property = 1

我预计这会比原始查询快很多,相同的样本数据相同;它运行在与原始查询相同的2s到3s范围内。这是相同的问题,其中搜索什么是不是比搜索 慢得多; 如果我们正在寻找包含属性的行,我们可以避免 demo 的seqscan并扫描属性



同样,对包含数组的表执行seq扫描也可以完成作业。


I have a table in postgresql that contains an array which is updated constantly.

In my application i need to get the number of rows for which a specific parameter is not present in that array column. My query looks like this:

select count(id) 
from table 
where not (ARRAY['parameter value'] <@ table.array_column)

But when increasing the amount of rows and the amount of executions of that query (several times per second, possibly hundreds or thousands) the performance decreses a lot, it seems to me that the counting in postgresql might have a linear order of execution (I’m not completely sure of this).

Basically my question is:

Is there an existing pattern I’m not aware of that applies to this situation? what would be the best approach for this?

Any suggestion you could give me would be really appreciated.

解决方案

Is there an existing pattern I’m not aware of that applies to this situation? what would be the best approach for this?

Your best bet in this situation might be to normalize your schema. Split the array out into a table. Add a b-tree index on the table of properties, or order the primary key so it's efficiently searchable by property_id.

CREATE TABLE demo( id integer primary key );
INSERT INTO demo (id) SELECT id FROM arrtable;
CREATE TABLE properties (
  demo_id integer not null references demo(id),
  property integer not null,
  primary key (demo_id, property)
);
CREATE INDEX properties_property_idx ON properties(property);

You can then query the properties:

SELECT count(id) 
FROM demo 
WHERE NOT EXISTS (
  SELECT 1 FROM properties WHERE demo.id = properties.demo_id AND property = 1
)

I expected this to be a lot faster than the original query, but it's actually much the same with the same sample data; it runs in the same 2s to 3s range as your original query. It's the same issue where searching for what is not there is much slower than searching for what is there; if we're looking for rows containing a property we can avoid the seqscan of demo and just scan properties for matching IDs directly.

Again, a seq scan on the array-containing table does the job just as well.

这篇关于优化PostgreSQL的计数查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆