了解Postgres行大小 [英] Making sense of Postgres row sizes

查看:135
本文介绍了了解Postgres行大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大型(> 100M行)Postgres表格,结构为{integer,integer,integer,timestamp without time zone}。我预计一行的大小为3 * integer + 1 * timestamp = 3 * 4 + 1 * 8 = 20个字节。



实际上,行大小是 pg_relation_size(tbl)/ count(*) = 52字节。为什么?



(没有对表进行删除: pg_relation_size(tbl,'fsm')〜= 0 )

解决方案

计算行大小比这更复杂。



存储通常在8 kb 数据页中分区。每页的固定开销很小,可能的余数不够大,不能适合另一个元组,更重要的是死行或最初使用 FILLFACTOR 设置保留的百分比。



更重要的是,每行开销 (元组)。 HeapTupleHeader 23个字节和 对齐填充 。元组标题的开始以及元组数据的开始在一个典型的64位机器上以8字节为单位的 MAXALIGN 的倍数排列。一些数据类型需要与2,4或8个字节的下一个倍数进行对齐。



在系统表上引用手册 pg_tpye


typalign 是存储此类型值时所需的对齐方式。
它适用于磁盘上的存储以及PostgreSQL中
值的大部分表示形式。当连续存储多个值
时,例如在
磁盘上的完整行的表示中,将填充插入到此类型的基准之前,以便
从指定的边界开始。对齐参考是序列中第一个数据的
开始。



可能的值为:




  • c = char 对齐,即不需要对齐。 / p>


  • s = short alignment(2 bytes在大多数机器上)。


  • = int 对齐(大多数机器上4字节)。


  • d = 双重对齐(许多机器上为8个字节,但绝非全部)。



阅读手册中的基本知识这里



您的示例



这将导致4个字节的填充, code> integer 列,因为时间戳列需要 double 对齐和需要以8个字节的下一个倍数开始。



所以,一行占用:

 code> 23  -  heaptupheader 
+ 1 - 填充或NULL位图
+ 12 - 3 *整数(无对齐填充)
+ 4 - 第3整数之后的填充
+ 8 - 时间戳
+ 0 - 没有padding,因为元组以MAXALIGN的倍数结束

最后,页眉中每个元组有一个 ItemData 指针(项目指针)(如在评论)占用4个字节:

  + 4  - 页眉中的项目指针
------
= 52个字节

所以我们到达观察到的 52字节



计算 pg_relation_size(tbl)/ count(*)是一个悲观的估计。 pg_relation_size(tbl)包括由 fillfactor 保留的膨胀(死行)和空格,以及每个数据页的开销,每张桌子(而且我们甚至没有在 TOAST表中提到长时间varlena数据的压缩,因为它不适用于此。)



您可以安装附加模块 pgstattuple ,并调用 SELECT * FROM pgstattuple('tbl_name'); 了解更多信息在表和元组大小上。



相关答案:




I got a large (>100M rows) Postgres table with structure {integer, integer, integer, timestamp without time zone}. I expected the size of a row to be 3*integer + 1*timestamp = 3*4 + 1*8 = 20 bytes.

In reality the row size is pg_relation_size(tbl) / count(*) = 52 bytes. Why?

(No deletes are done against the table: pg_relation_size(tbl, 'fsm') ~= 0)

解决方案

Calculation of row size is much more complex than that.

Storage is typically partitioned in 8 kb data pages. There is a small fixed overhead per page, possible remainders not big enough to fit another tuple, and more importantly dead rows or a percentage initially reserved with the FILLFACTOR setting.

More importantly, there is overhead per row (tuple). The HeapTupleHeader of 23 bytes and alignment padding. The start of the tuple header as well as the start of tuple data are aligned at a multiple of MAXALIGN, which is 8 bytes on a typical 64-bit machine. Some data types require alignment to the next multiple of 2, 4 or 8 bytes.

Quoting the manual on the system table pg_tpye:

typalign is the alignment required when storing a value of this type. It applies to storage on disk as well as most representations of the value inside PostgreSQL. When multiple values are stored consecutively, such as in the representation of a complete row on disk, padding is inserted before a datum of this type so that it begins on the specified boundary. The alignment reference is the beginning of the first datum in the sequence.

Possible values are:

  • c = char alignment, i.e., no alignment needed.

  • s = short alignment (2 bytes on most machines).

  • i = int alignment (4 bytes on most machines).

  • d = double alignment (8 bytes on many machines, but by no means all).

Read about the basics in the manual here.

Your example

This results in 4 bytes of padding after your 3 integer columns, because the timestamp column requires double alignment and needs to start at the next multiple of 8 bytes.

So, one row occupies:

   23   -- heaptupleheader
 +  1   -- padding or NULL bitmap
 + 12   -- 3 * integer (no alignment padding here)
 +  4   -- padding after 3rd integer
 +  8   -- timestamp
 +  0   -- no padding since tuple ends at multiple of MAXALIGN

Finally, there is an ItemData pointer (item pointer) per tuple in the page header (as pointed out by @A.H. in the comment) that occupies 4 bytes:

 +  4   -- item pointer in page header
------
 = 52 bytes

So we arrive at the observed 52 bytes.

The calculation pg_relation_size(tbl) / count(*) is a pessimistic estimation. pg_relation_size(tbl) includes bloat (dead rows) and space reserved by fillfactor, as well as overhead per data page and per table. (And we didn't even mention compression for long varlena data in TOAST tables, since it doesn't apply here.)

You can install the additional module pgstattuple and call SELECT * FROM pgstattuple('tbl_name'); for more information on table and tuple size.

Related answer:

这篇关于了解Postgres行大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆