添加新索引值时,Postgresql变得不负责任 [英] Postgresql becomes unresponsible when new index value is added

查看:554
本文介绍了添加新索引值时,Postgresql变得不负责任的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的应用程序中,我有一个季节的概念,随着时间的推移会不断变化。所有实体都与某个季节有关。所有实体都有基于季节的指数以及其他领域的一些指数。当季节变化发生时,postgresql决定使用基于季节索引的过滤扫描计划而不是更具体的字段索引。在赛季初,这样决定的计划成本很少,所以没关系,但问题是 - 季节变化让很多用户在赛季初就来了,所以基于postgresql扫描的查询计划变得非常快 - 它只扫描新季节中的所有实体,并过滤目标项目。在第一次自动分析后,postgres决定使用一个好的计划,但是由于争用而自动分析非常缓慢,我认为它就像一个雪球 - 请求越多,争用就越多,因为计划不好因此自动分析工作缓慢慢慢地。自动分析工作的最长时间是上周大约一个小时,这成了一个真正的问题。我知道postgresql架构师决定禁用选择查询中使用的索引的可能性,但是什么是克服我的问题的最佳方法呢?

In my app I have a concept of "seasons" which change discretely over time. All the entities are related to some season. All entities have season based indices as well as some indices on other fields. When season change occurs, postgresql decides to use filtered scan plan based on season index rather than more specific field indices. At the beginning of the season the planning cost of such decision is very little, so it's ok, but the problem is - season change brings MANY users to come at the very beginning of the season, so postgresql scan based query plan becomes bad very fast - it simply scans all the entities in the new season, and filters target items. After first auto analyze postgres decides to use a good plan, BUT auto analyze runs VERY SLOWLY due to contention and I suppose it's like a snowball - the more requests are done, the more contention is due to a bad plan and thus auto analyze works slowly and slowly. The biggest time for auto analyze to work was about an hour last week, and it becomes a real problem. I know postgresql architects decided to disable the possibility to choose the index used in query, but what is the best way to overcome my problem then?

只是为了澄清,这里是一个DDL,一个慢查询并在自动分析之前和之后解释结果。

Just to clarify, here is a DDL, one of the "slow" queries and explain results before and after auto analyze.

DDL

CREATE TABLE race_results (
  id INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('race_results_id_seq'::regclass),
  user_id INTEGER NOT NULL,
  opponent_id INTEGER,
  season_id INTEGER NOT NULL,
  type RACE_TYPE NOT NULL DEFAULT 'battle'::race_type,
  elo_delta INTEGER NOT NULL,
  opponent_elo_delta INTEGER NOT NULL DEFAULT 0,
);
CREATE INDEX race_results_type_user_id_index ON race_results USING BTREE (season_id, type, user_id);
CREATE INDEX race_results_type_opponent_id_index ON race_results USING BTREE (season_id, type, opponent_id);
CREATE INDEX race_results_opponent_id_index ON race_results USING BTREE (opponent_id);
CREATE INDEX race_results_user_id_index ON race_results USING BTREE (user_id);

查询

SELECT 1000 + COALESCE(SUM(CASE WHEN user_id = 6446 THEN elo_delta ELSE opponent_elo_delta END), 0)
        FROM race_results
        WHERE type = 'battle' :: race_type AND (user_id = 6446 OR opponent_id = 6446) AND
              season_id = current_season_id()

自动分析前的解释结果(正如您所看到的,过滤器已经删除了超过一千个项目,很快每个请求就会变成数十万个)

Results of explain before auto analyze (as you see more than a thousand items is already removed by filter and soon it becomes hundreds of thousands for each request)

自动分析后解释分析的结果(现在postgres决定使用正确的索引,不再需要过滤,但问题是 - 自动分析花费的时间太长,部分原因是前一张图片中无效索引选择的争用)

Results of explain analyze after auto analyze (now postgres decides to use the right index and no filtering needed anymore, but the problem is - auto analyze takes too long partly due to contention of ineffective index selection in previous picture)

ps:现在我解决问题只是在季节变化后10秒钟关闭应用程序服务器,以便postgres获取新数据并启动自动分析,然后打开它,当自动分析完成,但这样的解决方案涉及停机时间,这是不可取的,总体上看起来很奇怪

ps: Now I'm solving the problem just turning off the application server after 10 seconds after season changes so that postgres gets new data and starts autoanalyze, and then turn it on, when autoanalyze finishes, but such solution involves downtime, which is not desirable and overall it looks weird

推荐答案

最后我找到了解决方案。这不是完美的,我不会把它标记为最好的,但是它有效并且可以帮助某人。

Finally I found the solution. It's not perfect and I will not mark it as the best one, however it works and could help someone.

而不是季节,类型和用户/对手id的指数,我现在有索引

Instead of indices on season, type and user/opponent id, I now have indices

CREATE INDEX race_results_type_user_id_index ON race_results USING BTREE (user_id,season_id, type);
CREATE INDEX race_results_type_opponent_id_index ON race_results USING BTREE (opponent_id,season_id, type);

出现了一个问题 - 无论如何我在其他查询中需要和索引,但是当我添加索引时

One problem which appeared - I needed and index on season anyway in other queries, but when I add index

CREATE INDEX race_results_season_index ON race_results USING BTREE (season_id);

计划者再次尝试使用它而不是那些正确的指数,并重复整个情况。我所做的只是增加了一个字段:'season_id_clone',它包含与'season_id'相同的数据,并且我对它做了一个索引。现在,当我需要根据季节过滤某些内容(不包括第一篇文章中的查询)时,我在查询中使用了season_id_clone。我知道这很奇怪,但我没有找到更好的东西。

the planner tries to use it again instead of those right indices and the whole situation is repeated. What I've done is simply added one more field: 'season_id_clone', which contains the same data as 'season_id', and I made an index against it. Now when I need to filter something based on season (not including queries from the first post), I'm using season_id_clone in query. I know it's weird, but I haven't found anything better.

这篇关于添加新索引值时,Postgresql变得不负责任的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆