红移表大小 [英] Redshift table size

查看：42 发布时间：2021/4/3 19:18:11 amazon-redshift

本文介绍了红移表大小的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这对我来说更像是一个令人困惑的问题，想了解原因.

This is more like a puzzling question for me and would like to understand why.

我有两个表，几乎完全相同，唯一的区别是一列的数据类型和排序键.

I have two tables, almost identical the only differences are one column's data type and sortkey.

table                             mbytes    rows
stg_user_event_properties_hist    460948    2378751028
stg_user_event_properties_hist_1  246442    2513860837

即使行数几乎相同，大小也接近两倍.

Even though they have almost same number of rows, size is close to double.

这是表格结构

stg.stg_user_event_properties_hist
(
id                bigint,
source            varchar(20),
time_of_txn       timestamp,
product           varchar(50),
region            varchar(50),
city              varchar(100),
state             varchar(100),
zip               varchar(10),
price             integer,
category          varchar(50),
model             varchar(50),
origin            varchar(50),
l_code            varchar(10),
d_name            varchar(100),
d_id              varchar(10),
medium            varchar(255),
network           varchar(255),
campaign          varchar(255),
creative          varchar(255),
event             varchar(255),
property_name     varchar(100),
property_value    varchar(4000),
source_file_name  varchar(255),
etl_batch_id      integer,
etl_row_id        integer,
load_date         timestamp       
);



stg.stg_user_event_properties_hist_1
(
id                bigint,
source            varchar(20),
time_of_txn       timestamp,
product           varchar(50),
region            varchar(50),
city              varchar(100),
state             varchar(100),
zip               varchar(10),
price             integer,
category          varchar(50),
model             varchar(50),
origin            varchar(50),
l_code            varchar(10),
d_name            varchar(100),
d_id              varchar(10),
medium            varchar(255),
network           varchar(255),
campaign          varchar(255),
creative          varchar(255),
event             varchar(255),
property_name     varchar(100),
property_value    varchar(4000),
source_file_name  varchar(255),
etl_batch_id      integer,
etl_row_id        varchar(20),
load_date         timestamp
);

差异再次出现 etl_row_id 在_1中的数据类型为varchar(20)，在另一个表中的数据为整数，并且第一个表在源列上具有排序键.

The differences again etl_row_id has data type varchar(20) in _1, integer in the other table, and the first table has a sortkey on source column.

对尺寸差异的解释是什么?

What would be the explanation for the size difference?

更新:问题在于压缩和排序键，即使使用CTAS 11 of 26创建的_1表具有不同的压缩设置，第一个表也是使用14列的Compound SortKey创建的，没有排序键的情况下重新创建了该表(这是一个历史表，之后是全部)的容量降至231GB.

UPDATE: The problem was both compression and sort keys, even though _1 table created with CTAS 11 of 26 had different compression settings, also the first table was created with Compound SortKey of 14 columns, recreated the table with no sort keys (it's a history table after all) size went down to 231GB.

红移表大小 [英] Redshift table size

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

红移表大小 [英] Redshift table size

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭