为什么AVG行长是预期的4倍? [英] Why is the AVG row length 4 times as large as expected?

查看:54
本文介绍了为什么AVG行长是预期的4倍?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这张桌子:

CREATE TABLE `tree_paths` (
  `child_id` int(10) unsigned NOT NULL,
  `parent_id` int(10) unsigned NOT NULL,
  `sponsor_upline` tinyint(1) NOT NULL DEFAULT '0',
  `left_leg` tinyint(1) NOT NULL,
  `binary_level` smallint(5) unsigned NOT NULL DEFAULT '0',
  `sponsor_level` smallint(5) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`child_id`,`parent_id`),
  KEY `tree_paths_parent_id_foreign` (`parent_id`),
  CONSTRAINT `tree_paths_child_id_foreign` FOREIGN KEY (`child_id`) REFERENCES `users` (`id`) ON DELETE CASCADE,
  CONSTRAINT `tree_paths_parent_id_foreign` FOREIGN KEY (`parent_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC;

int(10)消耗4字节,smallint(5) 2字节,tinyint(1) 1字节.因此,一行原始数据应为14字节.

An int(10) consumes 4 Byte, a smallint(5) 2 Byte, a tinyint(1) 1 Byte. Thus, one row raw data should be 14 Byte.

我插入了约1600万行.由于某些原因,AVG行长度为57字节而不是14字节.

I have ~16 Million rows inserted. For some reason the AVG row length is 57 Byte instead of 14 Byte.

主元组键的B树使用的存储量是数据本身的3倍是否现实?

Is it realistic, that the B-tree for the primary tuple key uses 3 times as much storage as the data itself?

此外,我已经读过 https://stackoverflow.com/a/30318725/2311074 >

In addition, I have read in https://stackoverflow.com/a/30318725/2311074 that

主键已经在DATA_LENGTH中进行了计算. INDEX_LENGTH表示所有二级索引的大小".

the primary key is already accounted in the DATA_LENGTH. INDEX_LENGTH means "the size of all secondary indexes".

如果是这样,那么为什么index_length为总大小的1/4?就像上面看到的那样,我没有辅助键.还是来自FOREIGN键?

If that is true, then why is index_length 1/4 of the total size? AS one can see above, I have no secondary keys. Or does it come from the FOREIGN keys?

备注:

我发现了与我类似的问题.

I found similar questions to mine.

一个是为什么InnoDB表大小比预期的大得多?,但它已经10岁了,我一点都不明白.该问题的答案是,隐藏的6字节列是造成此差异的原因.但是,差异不是6,而是36.并且用户未使用任何键,但Index_length仍然很大.所以我不知道这有什么意义,但是我想十年前情况可能有所不同.

One is Why is InnoDB table size much larger than expected? but it is 10 years old and I don't get it at all. The answer of that question is that a hidden 6-byte column is the reason for the discrepancy there. However, the discrepancy is not 6 but 36. And the user did not use any key, still Index_length is huge. So I have no idea how any of that makes sense, but I guess 10 years ago things were maybe different.

我找到的另一个答案是平均行长超出了可能 .这个答案的重点是InnoDb分配了比所需更多的空间.但是,当我检查show table status时,我看到还有7.3 MB的可用空间.所以看来我可以驳斥这一点.

Another answer that I found was at Average row length higher than possible. The point in this answer is that InnoDb just allocated more space then it needs. However, when I check show table status I see that I have 7.3 MB free left. So it seems I can dismiss this.

推荐答案

平均行大小偏高的原因很多.

There are many reasons for the avg row size to be high.

  • 这是一个近似值. (我发现它通常高2到3倍.)在一种极端情况下(表中的一行),每行将占用16384个字节.那是一个InnoDB块.表中的行数是估计的.用于行的磁盘空间是准确的,但请参见下面的开销.平均行大小是这两者的商.

  • It is an approximation. (I have found that it is typically 2x-3x high.) In one extreme case -- one row in the table -- it will claim 16384 bytes per row. That is one InnoDB block. The number of rows in the table is estimated. The disk space used for the rows is exact, but see overheads, below. The avg row size is the quotient of those two.

每列的开销-1或2个字节

Overhead per column -- 1 or 2 bytes

每行的开销-20-30字节-用于处理事务,在块中查找行等

Overhead per row -- 20-30 bytes -- for handling transactions, finding rows in a block, etc

每个块的开销-每个16KB块一定数量的字节

Overhead per block -- some number of bytes per 16KB block

在BTree中颠簸的开销-最小值约为一个块的1/16,最大值约为该块的一半,经过大量删除和/或随机插入后,平均值约为30%.

Overhead for thrashing in a BTree -- min is about 1/16 of a block, max is about half the block, the average is about 30% after lots of deletes and/or random inserts.

用于预先分配大块磁盘空间(1MB?8MB?)的开销

Overhead for pre-allocating chunks of disk space (1MB? 8MB?)

随着一张桌子的尺寸从一个块开始增长,布局算法将发生变化,并且间接费用的百分比会暂时上升.

As a table grows from fitting in one block, the layout algorithm shifts, and the percentage of overhead temporarily spikes.

已删除的行不会将其空间返回给操作系统,因此文件大小保持不变,从而增加了 apparent 行的大小.

Deleted rows do not return their space to the OS, so the file size stays constant, thereby increasing the apparent row size.

如果您没有可提升为PK的显式PRIMARY KEYUNIQUE密钥,则PK会有一个不可访问的6字节字段(每行).

If you do not have an explicit PRIMARY KEY or a UNIQUE key that can be promoted to PK, then there is an inaccessibly 6-byte field (per row) for the PK.

TEXT/BLOB甚至VARCHAR被记录外"存储.这使计算复杂得多.它取决于您使用的是4个ROW_FORMATs中的哪个.在某些情况下,存在一个20字节的指针"指针.每个这样的单元格.

Large TEXT/BLOB and even VARCHAR are stored "off-record". This complicates the computations a lot. And it is dependent on which of the 4 ROW_FORMATs you are using. In some cases there is a 20-byte "pointer" for each such cell.

FOREIGN KEY约束不会增加所需的空间,只是它们 可能会强制创建索引.

FOREIGN KEY constraints do not add to the space required, except that they may force the creation of an index.

INDEXes均不包含在avg_row_length中.

INDEXes, other than the PRIMARY KEY are no included in the avg_row_length.

PRIMARY KEY 通常通常在 data BTree中的开销很小.一个简单的经验法则是开销为1%(在列本身之上).这是BTree的非叶子节点.

The PRIMARY KEY usually involves very little overhead in the data BTree. A simple Rule of Thumb is 1% overhead (on top of the column, itself). This overhead is the non-leaf nodes of the BTree.

当InnoDB事务繁忙时,任何已修改的行都保留在历史记录列表"中.这会导致更多开销.

While an InnoDB transaction is busy, any modified rows are held onto in the "history list". This leads to more overhead.

(不完全相关). InnoDB的COMPRESSED存在问题-与通常的3x文本压缩不同,它仅提供大约2x的压缩.由于需要同时将buffer_pool中的压缩数据和未压缩数据同时存储(至少用于某些块),因此要花费一些RAM.

(Not totally related). InnoDB's COMPRESSED has issues -- it gives only about 2x compression, unlike typical text compression of 3x. It costs some RAM because of needing to have both the compressed and uncompressed data in the buffer_pool at the same time (for at least some blocks).

SHOW TABLE STATUS和从information_schema.TABLES提取将提供相同的数据.有一些方法可以使数据和每个表对B + Tree的深度有一定的了解.

SHOW TABLE STATUS and fetching from information_schema.TABLES gives the same data. There are ways to get some insight into the depth of the B+Tree for the data and for each table.

这篇关于为什么AVG行长是预期的4倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆