红移表大小 [英] Redshift table size

查看:42
本文介绍了红移表大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这对我来说更像是一个令人困惑的问题,想了解原因.

This is more like a puzzling question for me and would like to understand why.

我有两个表,几乎完全相同,唯一的区别是一列的数据类型和排序键.

I have two tables, almost identical the only differences are one column's data type and sortkey.

table                             mbytes    rows
stg_user_event_properties_hist    460948    2378751028
stg_user_event_properties_hist_1  246442    2513860837

即使行数几乎相同,大小也接近两倍.

Even though they have almost same number of rows, size is close to double.

这是表格结构

stg.stg_user_event_properties_hist
(
id                bigint,
source            varchar(20),
time_of_txn       timestamp,
product           varchar(50),
region            varchar(50),
city              varchar(100),
state             varchar(100),
zip               varchar(10),
price             integer,
category          varchar(50),
model             varchar(50),
origin            varchar(50),
l_code            varchar(10),
d_name            varchar(100),
d_id              varchar(10),
medium            varchar(255),
network           varchar(255),
campaign          varchar(255),
creative          varchar(255),
event             varchar(255),
property_name     varchar(100),
property_value    varchar(4000),
source_file_name  varchar(255),
etl_batch_id      integer,
etl_row_id        integer,
load_date         timestamp       
);



stg.stg_user_event_properties_hist_1
(
id                bigint,
source            varchar(20),
time_of_txn       timestamp,
product           varchar(50),
region            varchar(50),
city              varchar(100),
state             varchar(100),
zip               varchar(10),
price             integer,
category          varchar(50),
model             varchar(50),
origin            varchar(50),
l_code            varchar(10),
d_name            varchar(100),
d_id              varchar(10),
medium            varchar(255),
network           varchar(255),
campaign          varchar(255),
creative          varchar(255),
event             varchar(255),
property_name     varchar(100),
property_value    varchar(4000),
source_file_name  varchar(255),
etl_batch_id      integer,
etl_row_id        varchar(20),
load_date         timestamp
);

差异再次出现 etl_row_id 在_1中的数据类型为varchar(20),在另一个表中的数据为整数,并且第一个表在源列上具有排序键.

The differences again etl_row_id has data type varchar(20) in _1, integer in the other table, and the first table has a sortkey on source column.

对尺寸差异的解释是什么?

What would be the explanation for the size difference?

更新:问题在于压缩和排序键,即使使用CTAS 11 of 26创建的_1表具有不同的压缩设置,第一个表也是使用14列的Compound SortKey创建的,没有排序键的情况下重新创建了该表(这是一个历史表,之后是全部)的容量降至231GB.

UPDATE: The problem was both compression and sort keys, even though _1 table created with CTAS 11 of 26 had different compression settings, also the first table was created with Compound SortKey of 14 columns, recreated the table with no sort keys (it's a history table after all) size went down to 231GB.

推荐答案

怀疑较大的表具有不同的压缩设置或根本没有压缩.您可以使用我们的视图 v_generate_tbl_ddl 生成包含压缩设置的表DDL.

Suspect that the larger table has different compression settings or no compression at all. You can use our view v_generate_tbl_ddl to generate table DDL that includes the compression settings.

即使具有相同的压缩设置,表大小也会因不同的排序键而有所不同.排序键用于将数据放入磁盘上的块中.如果一种排序键将许多相似的列值放在一起,则压缩效果会更好,所需空间也较小.

Even with the same compression settings table size can vary with different sort keys. The sort key is use to place the data into blocks on disk. If one sort key places lots of similar column values together it will compress better and require less space.

这篇关于红移表大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆