WHERE子句不同组合的索引策略,包括文字样式 [英] Indexing strategy for different combinations of WHERE clauses incl. text patterns

查看:106
本文介绍了WHERE子句不同组合的索引策略,包括文字样式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此处继续其他问题:

如何获取date_part查询以命中索引?

在执行以下查询时,它命中了复合索引在datelocal,视图,印象,性别,年龄组字段上创建:

When executing the following query, it hits a compound index I created on the datelocal, views, impressions, gender, agegroup fields:

SELECT date_part('hour', datelocal)                AS hour
     , SUM(views) FILTER (WHERE gender = 'male')   AS male
     , SUM(views) FILTER (WHERE gender = 'female') AS female
FROM   reportimpression
WHERE  datelocal >= '2019-02-01' AND datelocal <  '2019-03-01'
GROUP  BY 1
ORDER  BY 1;

但是,我也希望能够根据例如,在哪里:

However, I'd like to be able to also filter this query down based on additional clauses in the WHERE, for example:

SELECT date_part('hour', datelocal)                AS hour
     , SUM(views) FILTER (WHERE gender = 'male')   AS male
     , SUM(views) FILTER (WHERE gender = 'female') AS female
FROM   reportimpression
WHERE  datelocal >= '2019-02-01' AND datelocal <  '2019-03-01'
AND network LIKE '%'
GROUP  BY 1
ORDER  BY 1;

第二个查询比第一个查询慢很多,尽管它在更少的记录上运行

This second query is MUCH slower than the first, although it should be operating on far fewer records, in addition to the fact that it doesn't hit my index.

表模式:

CREATE TABLE reportimpression (
    datelocal timestamp without time zone,
    devicename text,
    network text,
    sitecode text,
    advertisername text,
    mediafilename text,
    gender text,
    agegroup text,
    views integer,
    impressions integer,
    dwelltime numeric
);

-- Indices -------------------------------------------------------

CREATE INDEX reportimpression_datelocal_index ON reportimpression(datelocal timestamp_ops);
CREATE INDEX reportimpression_viewership_index ON reportimpression(datelocal timestamp_ops,views int4_ops,impressions int4_ops,gender text_ops,agegroup text_ops);
CREATE INDEX reportimpression_test_index ON reportimpression(datelocal timestamp_ops,(date_part('hour'::text, datelocal)) float8_ops);

分析输出:

Finalize GroupAggregate  (cost=1005368.37..1005385.70 rows=3151 width=24) (actual time=70615.636..70615.649 rows=24 loops=1)
  Group Key: (date_part('hour'::text, datelocal))
  ->  Sort  (cost=1005368.37..1005369.94 rows=3151 width=24) (actual time=70615.631..70615.634 rows=48 loops=1)
        Sort Key: (date_part('hour'::text, datelocal))
        Sort Method: quicksort  Memory: 28kB
        ->  Gather  (cost=1005005.62..1005331.75 rows=3151 width=24) (actual time=70615.456..70641.208 rows=48 loops=1)
              Workers Planned: 1
              Workers Launched: 1
              ->  Partial HashAggregate  (cost=1004005.62..1004016.65 rows=3151 width=24) (actual time=70613.132..70613.152 rows=24 loops=2)
                    Group Key: date_part('hour'::text, datelocal)
                    ->  Parallel Seq Scan on reportimpression  (cost=0.00..996952.63 rows=2821195 width=17) (actual time=0.803..69876.914 rows=2429159 loops=2)
                          Filter: ((datelocal >= '2019-02-01 00:00:00'::timestamp without time zone) AND (datelocal < '2019-03-01 00:00:00'::timestamp without time zone) AND (network ~~ '%'::text))
                          Rows Removed by Filter: 6701736
Planning time: 0.195 ms
Execution time: 70641.349 ms

我是否需要创建其他索引,调整SELECT或完全其他?

Do I need to create additional indexes, tweak my SELECT, or something else entirely?

推荐答案

您添加的谓词使用 Like 运算符:

Your added predicate uses the LIKE operator:

AND network LIKE '%'

实际查询计划取决于您通过的内容而不是'%'。
但是,通常,纯btree索引对此没有用。您将需要一个三字母组合索引或使用文本搜索基础结构或类似工具,具体取决于您要寻找的模式。

The actual query plan depends on what you pass instead of '%'. But, generally, plain btree indexes are useless for this. You'll need a trigram index or use the text search infrastructure or similar, depending on what patterns you might be looking for.

请参阅:

  • PostgreSQL LIKE query performance variations
  • Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

您甚至可以结合使用多种索引策略。示例:

You might even combine multiple indexing strategies. Example:

  • PostgreSQL: Find sentences closest to a given sentence

如果应该这样:

AND network = '<input_string>'

然后,一定要使用 = 运算符,而不是 Like 。按重要性升序排列的原因:

then, by all means, actually use the = operator, not LIKE. Reasons in ascending order of importance:


  1. 较短

  2. 较少混淆

  3. 使Postgres计划程序的工作更简单(便宜得多)

  4. 正确

  1. shorter
  2. less confusing
  3. makes the job for the Postgres planner simpler (very slightly cheaper)
  4. correct

不小心包含特殊字符的字符串,可能会得到错误的结果。请参阅:

If you pass a string with special characters inadvertently, you might get incorrect results. See:

  • Escape function for regular expression or LIKE patterns

这篇关于WHERE子句不同组合的索引策略,包括文字样式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆