如何根据列数据类型和行数计算MySQL表使用的磁盘空间? [英] How to calculate the diskspace used by MySQL table from column datatypes and number of rows?

查看:160
本文介绍了如何根据列数据类型和行数计算MySQL表使用的磁盘空间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个MySQL表,其中有超过30亿行托管在Google Cloud SQL上。



我想了解磁盘上的总大小是如何解释的列数据类型,行数和索引。



我希望它能像

  size_of_table_in_bytes = num_rows * [Sum over i {bytes_for_datatype_of_column(i))} 
+ Sum of j {Bytes_for_index(j)}]

但最终的磁盘大小不正确,而不是数据库大小显示的大小。



每个数据类型使用字节 $ b

https://dev.mysql.com/doc/refman/5.7/zh/storage-requirements.html

以及来自

https://dev.mysql.com/doc/refman/5.7 /en/innodb-physical-record.html#innodb-compact-row-format-characteristics



这里是我对header占用的字节的理解,每个列和每个索引

pre $ code> TABLE`depth`(
字节|列/标题/索引
2 |可变长度标题Ceil(num columns / 8)= Ceil(10/8)
5 |固定长度标题
3 |`日期`日期DEFAULT NULL,
7 |`receive_time` datetime 3)DEFAULT NULL,
8 |`instrument_token` bigint(20)unsi gned DEFAULT NULL,
1 |`level`tinyint(3)unsigned DEFAULT NULL,
2 |`bid_count` smallint(5)unsigned DEFAULT NULL,
8 |`bid_size` bigint(20 )unsigned DEFAULT NULL,
4 |`bid_price` float DEFAULT NULL,
4 |`ask_price` float DEFAULT NULL,
8 |`ask_size` bigint(20)unsigned DEFAULT NULL,
2 |`ask_count` smallint(5)unsigned DEFAULT NULL,
6 | KEY`date_time_sym(`date`,`receive_time`,`instrument_token`),
6 | KEY`date_sym_time(`date ````instrument_token`,`receive_time`)
)ENGINE = InnoDB DEFAULT CHARSET = utf8`



<总共达到72字节。



但根据SHOW TABLE STATUS,Avg_row_length = 79.

问题1:我在哪里获得每行的字节数是错误的?



我相当确定我的数据中没有空值。假设我在计算字节时出错,并且每行使用79个字节
并使用 SELECT COUNT(*)计数行, ) as 3,017,513,240

  size_of_table_in_bytes = 79 * 3,017,513,240 = 238,383,545,960 

获取大小的另一种方法是使用MySQL查询

  SHOW TABLE STATUS from mktdata where Name =depth; 

在这里,我得到一行表格输出,其中一些重要字段的值为: p>

 名称:depth 
引擎:InnoDB
版本:10
Row_format:Dynamic
行数:1,72,08,21,447
平均长度:78
Index_length:1,83,90,03,07,456
数据长度:1,35,24,53,32,480

起初我很惊慌,是17亿为3.01亿美元,但我在文档







行数。一些存储引擎(如MyISAM)存储确切的
计数。对于其他存储引擎,如InnoDB,这个值是一个近似值,可能会与实际值相差40%到50%。在这种情况下,使用SELECT COUNT(*)来获得准确的计数。



因此,3.01亿美元似乎适合行数,因此我希望表格大小为238 GB。然后,如果我加起来, Data_length Index_length $ b

c>,我得到 319,145,639,936



问题2:为什么我通过一种方法获得319 GB, 238 GB作为另一个。哪一个是对的?



此外,Google Cloud SQL Console为SQL数据库显示的总大小为742 GB。我唯一拥有的其他表 trade 恰好是深度和5列的行数的1/5。它的大小通过总和 Data_length Index_length 为57 GB。如果我添加两个表格大小,我得到376 GB。

问题3:742 GB似乎大约是376 GB的两倍(实际上是752)。这可能是因为备份吗?我知道Google Cloud SQL每天会自动备份一次吗?



由于上面问题3的合理性,我怀疑我的简单方法 size = num_rows * num_bytes_per_row 是错误的!这真的让我感到困扰,并会感谢任何帮助解决此问题。

解决方案


  • 20字节/行可能关闭

  • 不要相信 SHOW TABLE STATUS 给予行,使用 SELECT COUNT(*)... 注意它是如何关闭的。 b
  • 计算其他方式:135245332480/3017513240 = 45个字节。

  • 从45个字节中,我推断出很多单元都是NULL?

  • 每行中的每一列都有1或2个字节的开销。

  • ROW_FORMAT 很重要。 b $ b
  • TEXT BLOB (etc)与简单数据类型有着截然不同的规则。

  • 索引比您提到的6个字节多得多(请参阅您的其他帖子)。

  • BTree结构有一些开销。当按顺序加载时,每个块的15/16被填充(在文档中某处提及)。流失后,范围可以很容易地被填充50-100%; BTree的吸引力为69%(因此在其他帖子中为1.45)。


    保留相同数量的空间用于备份。 。


    • 我不知道他们在做什么。

    • 如果他们使用mysqldump(或类似的),它不是一个安全的公式 - 数据库的文本转储可能显着增大或减小。
    • 如果他们使用LVM,那么他们有充足的二进制转储空间。但由于COW,这是没有意义的。
    • (所以,我放弃了Q3。)



    • 云服务可以进行某种压缩吗?


      I have a MySQL table with more than a 3 billion rows hosted on Google Cloud SQL.

      I wish to understand how the total size on disk can be explained from the column data-types, number of rows and the indexes.

      I was hoping that it would be something like

      size_of_table_in_bytes = num_rows * [ Sum over i {bytes_for_datatype_of_column(i))}
                                          + Sum over j  {Bytes_for_index(j)} ]
      

      But I end up with incorrect disk-size than how much my database size shows.

      Using bytes per datatype on

      https://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html

      and additional bytes in InnoDB header and indexes from

      https://dev.mysql.com/doc/refman/5.7/en/innodb-physical-record.html#innodb-compact-row-format-characteristics

      Here is my understanding of bytes occupied by header, each column and each index

      TABLE `depth` (
      Bytes| Column/Header/Index
          2| variable length header Ceil(num columns/8) = Ceil (10/8)
          5| Fixed Length Header
          3|`date` date DEFAULT NULL,
          7|`receive_time` datetime(3) DEFAULT NULL,
          8|`instrument_token` bigint(20) unsigned DEFAULT NULL,
          1|`level `tinyint(3) unsigned DEFAULT NULL,
          2|`bid_count` smallint(5) unsigned DEFAULT NULL,
          8|`bid_size` bigint(20) unsigned DEFAULT NULL,
          4|`bid_price` float DEFAULT NULL,
          4|`ask_price` float DEFAULT NULL,
          8|`ask_size` bigint(20) unsigned DEFAULT NULL,
          2|`ask_count` smallint(5) unsigned DEFAULT NULL,
          6|KEY `date_time_sym (`date`,`receive_time`,`instrument_token`),
          6|KEY `date_sym_time (`date`,`instrument_token`,`receive_time`)
       ) ENGINE=InnoDB DEFAULT CHARSET=utf8`
      

      which totals to 72 bytes.

      But as per SHOW TABLE STATUS, Avg_row_length = 79.

      Question 1: Where am I getting the number of bytes per row wrong?

      I am reasonably sure that there are no nulls in my data.

      Assuming, I am making some mistake in counting bytes, and using 79 bytes per row and counting rows using SELECT COUNT(*) as 3,017,513,240:

      size_of_table_in_bytes = 79*3,017,513,240 = 238,383,545,960
      

      Another way to get the size is to use MySQL query

      SHOW TABLE STATUS from mktdata where Name = "depth";
      

      Here I get a table output with one row, with value of a few important fields as:

      Name: depth
      Engine:InnoDB
      Version:10
      Row_format:Dynamic
      Rows: 1,72,08,21,447 
      Avg_row_length: 78
      Index_length: 1,83,90,03,07,456
      Data_length:  1,35,24,53,32,480 
      

      At first I was alarmed, how Rows is 1.7 Billion instead of 3.01 Billion, but I found this in the documentation

      • Rows

      The number of rows. Some storage engines, such as MyISAM, store the exact count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40 to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count.

      So, 3.01 Billion seems right for number of rows, and therefore I expect table size to be 238 GB.

      But then, if I add up, Data_length and Index_length, I get 319,145,639,936

      Question 2: Why do I get 319 GB by one method and 238 GB as another. Which one is right if any?

      Moreover the overall size shown for the SQL database by Google Cloud SQL Console is 742 GB. The only other table I have, trade, has exactly 1/5th the number of rows of depth and 5 columns. It's size by summing Data_length and Index_length is 57 GB. If I add both the table sizes I get 376 GB.

      Question 3: 742 GB seems roughly twice of 376 GB (actually 752). Could this be because of the back-up? I know Google Cloud SQL does an automatic back-up once a day?

      Because of plausibility of Question 3 above, I got a doubt that my simple method of size = num_rows*num_bytes_per_row is wrong! This is really troubling me, and will appreciate any help in resolving this.

      解决方案

      • There is more overhead than you mentioned. 20 bytes/row might be close.
      • Don't trust SHOW TABLE STATUS for giving "Rows", use SELECT COUNT(*) ... Notice how it was off by nearly a factor of 2.
      • Compute the other way: 135245332480 / 3017513240 = 45 bytes.
      • From 45 bytes, I deduce that a lot of the cells are NULL?
      • Each column in each row has 1- or 2-byte overhead.
      • The ROW_FORMAT matters.
      • TEXT and BLOB (etc) have radically different rules than simple datatypes.
      • The indexes take a lot more than the 6 bytes you mentioned (see your other posting).
      • BTree structure has some overhead. When loaded in order, 15/16 of each block is filled (that is mentioned somewhere in the docs). After churn, the range can easily be 50-100% is filled; a BTree gravitates to 69% full (hence the 1.45 in the other posting).

      Reserving an equal amount of space for backup...

      • I don't know if that is what they are doing.
      • If they use mysqldump (or similar), it is not a safe formula -- the text dump of the database could be significantly larger or smaller.
      • If they use LVM, then they have room for a full binary dump. But that does not make sense because of COW.
      • (So, I give up on Q3.)

      Could the Cloud service be doing some kind of compression?

      这篇关于如何根据列数据类型和行数计算MySQL表使用的磁盘空间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆