理解 Postgres 行大小 [英] Making sense of Postgres row sizes

查看:23
本文介绍了理解 Postgres 行大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的(>100M 行)Postgres 表,其结构为 {integer, integer, integer, timestamp without time zone}.我预计一行的大小为 3*整数 + 1*时间戳 = 3*4 + 1*8 = 20 个字节.

I got a large (>100M rows) Postgres table with structure {integer, integer, integer, timestamp without time zone}. I expected the size of a row to be 3*integer + 1*timestamp = 3*4 + 1*8 = 20 bytes.

实际上行大小是 pg_relation_size(tbl)/count(*) = 52 字节.为什么?

In reality the row size is pg_relation_size(tbl) / count(*) = 52 bytes. Why?

(没有对表进行删除:pg_relation_size(tbl, 'fsm') ~= 0)

(No deletes are done against the table: pg_relation_size(tbl, 'fsm') ~= 0)

推荐答案

行大小的计算要复杂得多.

Calculation of row size is much more complex than that.

存储通常分区为 8 kB 数据页.每页有一个小的固定开销,可能的余数不足以容纳另一个元组,更重要的是死行或最初使用 FILLFACTOR 设置保留的百分比.

Storage is typically partitioned in 8 kB data pages. There is a small fixed overhead per page, possible remainders not big enough to fit another tuple, and more importantly dead rows or a percentage initially reserved with the FILLFACTOR setting.

还有更多的开销每行(元组):页面开头的 4 个字节的项目标识符,23 个字节的 HeapTupleHeader对齐填充.元组头的开始以及元组数据的开始以 MAXALIGN 的倍数对齐,在典型的 64 位机器上为 8 个字节.某些数据类型需要与 2、4 或 8 个字节的下一个倍数对齐.

And there is even more overhead per row (tuple): an item identifier of 4 bytes at the start of the page, the HeapTupleHeader of 23 bytes and alignment padding. The start of the tuple header as well as the start of tuple data are aligned at a multiple of MAXALIGN, which is 8 bytes on a typical 64-bit machine. Some data types require alignment to the next multiple of 2, 4 or 8 bytes.

引用系统表上的手册pg_tpye:

typalign 是存储这种类型的值时所需的对齐方式.它适用于磁盘上的存储以及大多数表示PostgreSQL 内部的值.存储多个值时连续,例如在表示完整的行磁盘,在这种类型的数据之前插入填充,以便它从指定的边界开始.对齐参考是序列中第一个数据的开始.

typalign is the alignment required when storing a value of this type. It applies to storage on disk as well as most representations of the value inside PostgreSQL. When multiple values are stored consecutively, such as in the representation of a complete row on disk, padding is inserted before a datum of this type so that it begins on the specified boundary. The alignment reference is the beginning of the first datum in the sequence.

可能的值是:

  • c = char 对齐,即不需要对齐.

  • c = char alignment, i.e., no alignment needed.

s = short 对齐(大多数机器上为 2 个字节).

s = short alignment (2 bytes on most machines).

i = int 对齐(大多数机器上为 4 个字节).

i = int alignment (4 bytes on most machines).

d = double 对齐(许多机器上为 8 个字节,但绝不是全部).

d = double alignment (8 bytes on many machines, but by no means all).

此处阅读手册中的基础知识.

Read about the basics in the manual here.

这会导致在您的 3 个 integer 列之后填充 4 个字节,因为 timestamp 列需要 double 对齐并且需要从下一个 8 个字节的倍数.

This results in 4 bytes of padding after your 3 integer columns, because the timestamp column requires double alignment and needs to start at the next multiple of 8 bytes.

所以,一行占据:

   23   -- heaptupleheader
 +  1   -- padding or NULL bitmap
 + 12   -- 3 * integer (no alignment padding here)
 +  4   -- padding after 3rd integer
 +  8   -- timestamp
 +  0   -- no padding since tuple ends at multiple of MAXALIGN

页面标题中每个元组的附加项目标识符(如由@指出评论中的啊):

Plus item identifier per tuple in the page header (as pointed out by @A.H. in the comment):

 +  4   -- item identifier in page header
------
 = 52 bytes

所以我们得到了观察到的 52 字节.

So we arrive at the observed 52 bytes.

计算pg_relation_size(tbl)/count(*) 是一个悲观的估计.pg_relation_size(tbl) 包括膨胀(死行)和 fillfactor 保留的空间,以及每个数据页和每个表的开销.(我们甚至没有提到 TOAST 表,因为它在这里不适用.)

The calculation pg_relation_size(tbl) / count(*) is a pessimistic estimation. pg_relation_size(tbl) includes bloat (dead rows) and space reserved by fillfactor, as well as overhead per data page and per table. (And we didn't even mention compression for long varlena data in TOAST tables, since it doesn't apply here.)

您可以安装附加模块 pgstattuple 并调用 SELECT *FROM pgstattuple('tbl_name'); 有关表和元组大小的更多信息.

You can install the additional module pgstattuple and call SELECT * FROM pgstattuple('tbl_name'); for more information on table and tuple size.

相关:

这篇关于理解 Postgres 行大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆