如何在 Redshift 中找到访问最多的表? [英] How to find the most accessed table in Redshift?

查看:23
本文介绍了如何在 Redshift 中找到访问最多的表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在将实时数据流式传输到 Redshift.瓶颈是没有可以并发运行的表负载.我们目前每 15 分钟运行超过 1000 次以上的负载.

We are streaming realtime data to Redshift. The bottleneck is no of table loads that can run concurrently. We at present are running more than 1000+ loads every 15mins.

但我们希望根据用户使用这些表的频率来减少这个数字.请建议我们如何在 Redshift 中获取此信息.

But we want to reduce this number based on how frequently these tables are used by the users. Please suggest how can we get this information in Redshift.

推荐答案

这个 view awslabs 开源,可用于查询最常查询的表.

This view open sourced by awslabs can be used to query the most frequently queried tables.

CREATE OR REPLACE VIEW admin.v_get_table_scan_frequency
AS
SELECT 
    database, 
    schema AS schemaname, 
    table_id, 
    "table" AS tablename, 
    size, 
    sortkey1, 
    NVL(s.num_qs,0) num_qs
FROM svv_table_info t
LEFT JOIN (SELECT
   tbl, perm_table_name,
   COUNT(DISTINCT query) num_qs
FROM
   stl_scan s
WHERE 
   s.userid > 1
   AND s.perm_table_name NOT IN ('Internal Worktable','S3')
GROUP BY 
   tbl, perm_table_name) s ON s.tbl = t.table_id
AND t."schema" NOT IN ('pg_internal')
ORDER BY 7 desc;

表格

\d admin.v_get_table_scan_frequency
   Column   |  Type  | Modifiers
------------+--------+-----------
 database   | text   |
 schemaname | text   |
 table_id   | oid    |
 tablename  | text   |
 size       | bigint |
 sortkey1   | text   |
 num_qs     | bigint |

查询

select * from admin.v_get_table_scan_frequency order by num_qs;

结果

database | schemaname | table_id | tablename | size  | sortkey1      | num_qs
-----------------+------------+----------+------------------------------------------+-------+---------------+--------
 db      | product    | 1        | table1    |    92 | AUTO(SORTKEY) |  13448
 db      | product    | 2        | table2    |   180 | AUTO(SORTKEY) |  13389

在 Prometheus 中保留此查询的时间序列数据有助于找到每个表随时间变化的速率和频率趋势.基于此,我们可以决定在 Redshift 中刷新数据的频率.

Keeping a time series data of this query in Prometheus can help find rate and frequency trend over time for each table. Based on that we can decided how frequently to refresh data in Redshift.

这篇关于如何在 Redshift 中找到访问最多的表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆