红移表大小 [英] Redshift table size
问题描述
这对我来说更像是一个令人困惑的问题,想了解原因.
This is more like a puzzling question for me and would like to understand why.
我有两个表,几乎完全相同,唯一的区别是一列的数据类型和排序键.
I have two tables, almost identical the only differences are one column's data type and sortkey.
table mbytes rows
stg_user_event_properties_hist 460948 2378751028
stg_user_event_properties_hist_1 246442 2513860837
即使行数几乎相同,大小也接近两倍.
Even though they have almost same number of rows, size is close to double.
这是表格结构
stg.stg_user_event_properties_hist
(
id bigint,
source varchar(20),
time_of_txn timestamp,
product varchar(50),
region varchar(50),
city varchar(100),
state varchar(100),
zip varchar(10),
price integer,
category varchar(50),
model varchar(50),
origin varchar(50),
l_code varchar(10),
d_name varchar(100),
d_id varchar(10),
medium varchar(255),
network varchar(255),
campaign varchar(255),
creative varchar(255),
event varchar(255),
property_name varchar(100),
property_value varchar(4000),
source_file_name varchar(255),
etl_batch_id integer,
etl_row_id integer,
load_date timestamp
);
stg.stg_user_event_properties_hist_1
(
id bigint,
source varchar(20),
time_of_txn timestamp,
product varchar(50),
region varchar(50),
city varchar(100),
state varchar(100),
zip varchar(10),
price integer,
category varchar(50),
model varchar(50),
origin varchar(50),
l_code varchar(10),
d_name varchar(100),
d_id varchar(10),
medium varchar(255),
network varchar(255),
campaign varchar(255),
creative varchar(255),
event varchar(255),
property_name varchar(100),
property_value varchar(4000),
source_file_name varchar(255),
etl_batch_id integer,
etl_row_id varchar(20),
load_date timestamp
);
差异再次出现 etl_row_id
在_1中的数据类型为varchar(20),在另一个表中的数据为整数,并且第一个表在源列上具有排序键.
The differences again etl_row_id
has data type varchar(20) in _1, integer in the other table, and the first table has a sortkey on source column.
对尺寸差异的解释是什么?
What would be the explanation for the size difference?
更新:问题在于压缩和排序键,即使使用CTAS 11 of 26创建的_1表具有不同的压缩设置,第一个表也是使用14列的Compound SortKey创建的,没有排序键的情况下重新创建了该表(这是一个历史表,之后是全部)的容量降至231GB.
UPDATE: The problem was both compression and sort keys, even though _1 table created with CTAS 11 of 26 had different compression settings, also the first table was created with Compound SortKey of 14 columns, recreated the table with no sort keys (it's a history table after all) size went down to 231GB.
推荐答案
怀疑较大的表具有不同的压缩设置或根本没有压缩.您可以使用我们的视图 v_generate_tbl_ddl
生成包含压缩设置的表DDL.
Suspect that the larger table has different compression settings or no compression at all. You can use our view v_generate_tbl_ddl
to generate table DDL that includes the compression settings.
即使具有相同的压缩设置,表大小也会因不同的排序键而有所不同.排序键用于将数据放入磁盘上的块中.如果一种排序键将许多相似的列值放在一起,则压缩效果会更好,所需空间也较小.
Even with the same compression settings table size can vary with different sort keys. The sort key is use to place the data into blocks on disk. If one sort key places lots of similar column values together it will compress better and require less space.
这篇关于红移表大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!