如何获取date_part查询以命中索引？ [英] How to get date_part query to hit index?

查看：91 发布时间：2020/5/29 21:42:38 postgresql indexing aggregate postgresql-performance

本文介绍了如何获取date_part查询以命中索引？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尚未能够使该查询命中索引而不是执行完整扫描-我还有另一个查询，它对几乎相同的表使用date_part（'day'，datelocal）（该表只有一点点）较少的数据，但结构相同），并且将命中我在datelocal列上创建的索引（这是没有时区的时间戳）。查询（此查询在表上执行并行seq扫描并执行内存快速排序）：

I have yet to be able to get this query to hit an index instead of performing a full scan - I have another query that uses date_part('day', datelocal) against an almost identical table (that table just has a bit less data but same structure) and that one will hit the index I created on the datelocal column (which is a timestamp without timezone). Query (this one performs a parallel seq scan on the table and does a memory quicksort):

SELECT
    date_part('hour', datelocal) AS hour,
    SUM(CASE WHEN gender LIKE 'male' THEN views ELSE 0 END) AS male,
    SUM(CASE WHEN gender LIKE 'female' THEN views ELSE 0 END) AS female
FROM reportimpression
WHERE datelocal >= '2-1-2019' AND datelocal < '2-28-2019'
GROUP BY date_part('hour', datelocal)
ORDER BY date_part('hour', datelocal)

这是另一个命中我的本地日期索引的

Here is the other one that does hit my datelocal index:

SELECT
    date_part('day', datelocal) AS day,
    SUM(CASE WHEN gender LIKE 'male' THEN views ELSE 0 END) AS male,
    SUM(CASE WHEN gender LIKE 'female' THEN views ELSE 0 END) AS female
FROM reportimpressionday
WHERE datelocal >= '2-1-2019' AND datelocal < '2-28-2019'
GROUP BY date_trunc('day', datelocal), date_part('day', datelocal)
ORDER BY date_trunc('day', datelocal)

这件事让我大吃一惊！关于如何加快第一个或至少使其达到索引的任何想法？我尝试在datelocal字段上创建索引，在datelocal，性别和视图上创建复合索引，并在date_part（'hour'，datelocal）上创建表达式索引，但是这些都没有用。

Banging my head about this! Any ideas as to how I can speed up the first one or at least make it hit an index? I've tried creating an index on the datelocal field, a compound index on datelocal, gender, and views, and an expression index on date_part('hour', datelocal) but none of that has worked.

模式：

-- Table Definition ----------------------------------------------

CREATE TABLE reportimpression (
    datelocal timestamp without time zone,
    devicename text,
    network text,
    sitecode text,
    advertisername text,
    mediafilename text,
    gender text,
    agegroup text,
    views integer,
    impressions integer,
    dwelltime numeric
);

-- Indices -------------------------------------------------------

CREATE INDEX reportimpression_datelocal_index ON reportimpression(datelocal timestamp_ops);
CREATE INDEX reportimpression_viewership_index ON reportimpression(datelocal timestamp_ops,views int4_ops,impressions int4_ops,gender text_ops,agegroup text_ops);
CREATE INDEX reportimpression_test_index ON reportimpression(datelocal timestamp_ops,(date_part('hour'::text, datelocal)) float8_ops);

-- Table Definition ----------------------------------------------

CREATE TABLE reportimpressionday (
    datelocal timestamp without time zone,
    devicename text,
    network text,
    sitecode text,
    advertisername text,
    mediafilename text,
    gender text,
    agegroup text,
    views integer,
    impressions integer,
    dwelltime numeric
);

-- Indices -------------------------------------------------------

CREATE INDEX reportimpressionday_datelocal_index ON reportimpressionday(datelocal timestamp_ops);
CREATE INDEX reportimpressionday_detail_index ON reportimpressionday(datelocal timestamp_ops,views int4_ops,impressions int4_ops,gender text_ops,agegroup text_ops);

解释（分析，缓冲）输出：

Explain (analyze, buffers) output:

Finalize GroupAggregate  (cost=999842.42..999859.67 rows=3137 width=24) (actual time=43754.700..43754.714 rows=24 loops=1)
  Group Key: (date_part('hour'::text, datelocal))
  Buffers: shared hit=123912 read=823290
  I/O Timings: read=81228.280
  ->  Sort  (cost=999842.42..999843.99 rows=3137 width=24) (actual time=43754.695..43754.698 rows=48 loops=1)
        Sort Key: (date_part('hour'::text, datelocal))
        Sort Method: quicksort  Memory: 28kB
        Buffers: shared hit=123912 read=823290
        I/O Timings: read=81228.280
        ->  Gather  (cost=999481.30..999805.98 rows=3137 width=24) (actual time=43754.520..43777.558 rows=48 loops=1)
              Workers Planned: 1
              Workers Launched: 1
              Buffers: shared hit=123912 read=823290
              I/O Timings: read=81228.280
              ->  Partial HashAggregate  (cost=998481.30..998492.28 rows=3137 width=24) (actual time=43751.649..43751.672 rows=24 loops=2)
                    Group Key: date_part('hour'::text, datelocal)
                    Buffers: shared hit=123912 read=823290
                    I/O Timings: read=81228.280
                    ->  Parallel Seq Scan on reportimpression  (cost=0.00..991555.98 rows=2770129 width=17) (actual time=13.097..42974.126 rows=2338145 loops=2)
                          Filter: ((datelocal >= '2019-02-01 00:00:00'::timestamp without time zone) AND (datelocal < '2019-02-28 00:00:00'::timestamp without time zone))
                          Rows Removed by Filter: 6792750
                          Buffers: shared hit=123912 read=823290
                          I/O Timings: read=81228.280
Planning time: 0.185 ms
Execution time: 43777.701 ms

推荐答案

好吧，您的两个查询都在不同的表上（ reportimpression 与 reportimpressionday ），因此两个查询的比较实际上不是比较。你们都分析了吗？各种列统计信息也可能起作用。索引或表膨胀可能会有所不同。所有行中是否有较大一部分符合2019年2月的条件？

Well, both your queries are on different tables (reportimpression vs. reportimpressionday), so the comparison of the two queries really isn't a comparison. Did you ANALYZE both? Various column statistics also may play a role. Index or table bloat may be different. Does a larger part of all rows qualify for Feb 2019? Etc.

在黑暗中拍摄一张照片，比较两个表的百分比：

One shot in the dark, compare the percentages for both tables:

SELECT tbl, round(share * 100 / total, 2) As percentage
FROM  (
   SELECT text 'reportimpression' AS tbl
        , count(*)::numeric AS total
        , count(*) FILTER (WHERE datelocal >= '2019-02-01' AND datelocal < '2019-03-01')::numeric AS share
   FROM  reportimpression

   UNION ALL
   SELECT 'reportimpressionday'
        , count(*)
        , count(*) FILTER (WHERE datelocal >= '2019-02-01' AND datelocal < '2019-03-01')
   FROM  reportimpressionday
  ) sub;

reportimpression 的那个更大吗？


通常，您的索引 reportimpression_datelocal_index  （datelocal）看起来很不错，并且 reportimpression_viewership_index 甚至允许自动索引超过表的写负载的仅索引扫描。 （尽管印象和amp;  agegroup 只是为此而已，如果没有它，效果会更好）。
Generally, your index reportimpression_datelocal_index on (datelocal) looks good for it, and reportimpression_viewership_index even allows index-only scans if autovacuum beats the write load on the table. (Though impressions & agegroup are just dead freight for this and it would work even better without).
您获得了  26.6％，一天是26.4％ 用于我的查询。对于这么大的百分比， 索引通常根本没有用。顺序扫描通常是最快的方法。如果基础表更大，则仅索引扫描 仍然有意义。 （或者您有 severe 严重的表膨胀和较少的索引膨胀，这使索引再次更具吸引力。）
You got 26.6 percent, and day is 26.4 percent for my query. For such a large percentage, indexes are typically not useful at all. A sequential scan is typically the fastest way. Only index-only scans may still make sense if the underlying table is much bigger. (Or you have severe table bloat, and less bloated indexes, which makes indexes more attractive again.)
您的第一个查询可能刚刚临界点。尝试缩小时间范围，直到看到仅索引扫描。您不会看到（位图）索引扫描的合格行占总数的大约5％以上（取决于许多因素）。
Your first query may just be across the tipping point. Try narrowing the time frame until you see index-only scans. You won't see (bitmap) index scans with more then roughly 5 % of all rows qualifying (depends on many factors).
尽可能考虑以下修改后的查询：
Be that as it may, consider these modified queries:
SELECT date_part('hour', datelocal)                AS hour
     , SUM(views) FILTER (WHERE gender = 'male')   AS male
     , SUM(views) FILTER (WHERE gender = 'female') AS female
FROM   reportimpression
WHERE  datelocal >= '2019-02-01'
AND    datelocal <  '2019-03-01' -- '2019-02-28'  -- ?
GROUP  BY 1
ORDER  BY 1;

SELECT date_trunc('day', datelocal)                AS day
     , SUM(views) FILTER (WHERE gender = 'male')   AS male
     , SUM(views) FILTER (WHERE gender = 'female') AS female
FROM   reportimpressionday
WHERE  datelocal >= '2019-02-01'
AND    datelocal <  '2019-03-01'
GROUP  BY 1
ORDER  BY 1;

 
 
 
要点
 
 
  
  当使用本地化的日期格式（如'2-1-2019'）时，请通过  to_timestamp（） 带有明确的格式说明符。否则，这取决于语言环境设置，并且从具有不同设置的会话中调用时可能会（无提示）中断。而是使用所示的ISO日期/时间格式，而不依赖于区域设置。


Major points


When using localized date format like '2-1-2019', go through to_timestamp() with explicit format specifiers. Else this depends on locale settings and might break (silently) when called from a session with different settings. Rather use ISO date / time formats as demonstrated which do not depend on locale settings.
看起来像您要包含整个月 2月。但是您的查询没有达到上限。一月中，二月可能有29天。  datelocal<  2019年2月28日 也不包括2月28日的全部时间。使用 datelocal<而不是 2019-03-01 。
Looks like you want to include the whole month of February. But your query misses out on the upper bound. For one, February may have 29 days. An datelocal < '2-28-2019' excludes all of Feb 28 as well. Use datelocal <  '2019-03-01' instead.
 分组并比较便宜如果可以的话，按与 SELECT 列表中相同的表达式进行排序。因此，在那里也使用 date_trunc（）。无需使用其他表达式。如果您需要结果中的日期部分，请将其应用于分组表达式，例如：
It's cheaper to group & sort by the same expression as you have in the SELECT list if you can. So use date_trunc() there, too. Don't use different expressions without need. If you need the datepart in the result, apply it on the grouped expression, like:
SELECT date_part('day', date_trunc('day', datelocal)) AS day
...
GROUP  BY date_trunc('day', datelocal)
ORDER  BY date_trunc('day', datelocal);

嘈杂的代码，但速度更快（也可能更容易针对查询计划程序进行优化） 。
A bit more noisy code, but faster (and possibly easier to optimize for the query planner, too).
使用Postgres 9.4或更高版本中的汇总 FILTER 子句。更干净，速度更快。请参阅：
Use the aggregate FILTER clause in Postgres 9.4 or later. It's cleaner and a bit faster. See:
 
  如何简化此游戏统计信息查询？ 
 
  对于绝对性能，SUM是更快还是COUNT ？？  
 
 

How can I simplify this game statistics query?
For absolute performance, is SUM faster or COUNT?


                        这篇关于如何获取date_part查询以命中索引？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何获取date_part查询以命中索引？ [英] How to get date_part query to hit index?

问题描述

推荐答案

要点

Major points

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何获取date_part查询以命中索引？ [英] How to get date_part query to hit index?

问题描述

推荐答案

要点

Major points

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭