PostgreSQL中计算和节省空间 [英] Calculating and saving space in PostgreSQL
问题描述
我在 pg 中有一张桌子,如下所示:
I have a table in pg like so:
CREATE TABLE t (
a BIGSERIAL NOT NULL, -- 8 b
b SMALLINT, -- 2 b
c SMALLINT, -- 2 b
d REAL, -- 4 b
e REAL, -- 4 b
f REAL, -- 4 b
g INTEGER, -- 4 b
h REAL, -- 4 b
i REAL, -- 4 b
j SMALLINT, -- 2 b
k INTEGER, -- 4 b
l INTEGER, -- 4 b
m REAL, -- 4 b
CONSTRAINT a_pkey PRIMARY KEY (a)
);
以上每行加起来最多 50 个字节.我的经验是,我需要另外 40% 到 50% 的系统开销,甚至没有任何用户创建的索引到上面.因此,每行大约 75 个字节.我将在表中有很多行,可能会超过 1450 亿行,因此该表将推动 13-14 TB.如果有的话,我可以使用什么技巧来压缩这张桌子?我可能的想法如下...
The above adds up to 50 bytes per row. My experience is that I need another 40% to 50% for system overhead, without even any user-created indexes to the above. So, about 75 bytes per row. I will have many, many rows in the table, potentially upward of 145 billion rows, so the table is going to be pushing 13-14 terabytes. What tricks, if any, could I use to compact this table? My possible ideas below ...
将 real
值转换为 integer
.如果它们可以存储为 smallint
,则每个字段可节省 2 个字节.
Convert the real
values to integer
. If they can stored as smallint
, that is a saving of 2 bytes per field.
将列 b .. m 转换为数组.我不需要搜索这些列,但我确实需要能够一次返回一列的值.所以,如果我需要 g 列,我可以做类似的事情
Convert the columns b .. m into an array. I don't need to search on those columns, but I do need to be able to return one column's value at a time. So, if I need column g, I could do something like
SELECT a, arr[5] FROM t;
使用数组选项可以节省空间吗?会有速度惩罚吗?
Would I save space with the array option? Would there be a speed penalty?
还有其他想法吗?
推荐答案
我认为将多个数字字段存储在一个数组中没有任何好处(也有一些损失).
I see nothing to gain (and something to lose) in storing several numeric fields in an array.
每种数字类型的大小都有明确记录,您应该简单地使用与您所需的范围分辨率兼容的最小尺寸类型;这就是你所能做的.
The size of each numerical type is clearly documented, you should simply use the smallest sized type compatible with your desired range-resolution; and that's about all you can do.
我不认为(但我不确定)是否对沿行的列有一些字节对齐要求,在这种情况下,重新排列列可能会改变使用的空间 - 但我不认为所以.
I don't think (but I'm not sure) if there is some byte alignment requirement for the columns along a row, in that case a reordering of the columns could alter the space used - but I don't think so.
顺便说一句,每行有一个固定开销,大约 23字节.
BTW, there is a fix overhead per row, about 23 bytes.
这篇关于PostgreSQL中计算和节省空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!