平均行长度高于可能 [英] Average row length higher than possible

查看:57
本文介绍了平均行长度高于可能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这不是 为什么是 InnoDB 表大小的副本比预期大得多? 该问题的答案指出,如果我不指定主键,则会向该行添加 6 个字节.我确实指定了一个主键,这里有6个多字节要解释.

This is not a duplicate of Why is InnoDB table size much larger than expected? The answer to that question states that if I don't specify a primary key then 6 bytes is added to the row. I did specify a primary key, and there is more than 6 bytes to explain here.

我有一个表,预计会有数百万条记录,因此我密切关注每一列的存储大小.每行应占用 15 个字节(smallint = 2 个字节,date = 3 个字节,datetime = 8 个字节)

I have a table that is expecting millions of records, so I paid close attention to the storage size of each column. Each row should take 15 bytes (smallint = 2 bytes, date = 3 bytes, datetime = 8 bytes)

CREATE TABLE archive (
  customer_id smallint(5) unsigned NOT NULL,
  calendar_date date NOT NULL,
  inserted datetime NOT NULL,
  value smallint(5) unsigned NOT NULL,
  PRIMARY KEY (`customer_id`,`calendar_date`,`inserted`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

该表现在有 50 万条记录,并且占用的存储空间超出预期.我运行此查询以从系统获取更多详细信息:

The table now has a half million records in it and is taking more storage than expected. I ran this query to get more details from the system:

SELECT *
  FROM information_schema.TABLES
 WHERE table_name = 'archive';


information_schema.index_length = 0
information_schema.avg_row_length = 37
information_schema.engine = InnoDB
information_schema.table_type = BASE TABLE

怎么!?

我期望每行有 15 个字节,结果需要 37 个字节. 谁能告诉我下一步该去哪里寻找解释?我已经阅读了很多关于 thais 的文章,并且看到了一些关于将额外的 6 或 10 个字节添加到行大小的解释,但这并不能解释 22 个额外的字节.

I was expecting 15 bytes per row, and it's taking 37. Can anyone give me an idea of where to look next for an explanation? I've done a lot of reading on thais and I've seen some explanations for an extra 6 or 10 bytes being added to a row size, but that doesn't explain the 22 extra bytes.

一种解释是索引也占用存储空间.此表上没有索引.

One explanation is that indexes also take up storage. There are no indexes on this table.

一种解释是 information_schema.tables 查询返回一个不可靠的行数,这会丢弃 avg_row_length.我已经根据 count(*) 查询检查了它正在使用的行数,它只偏离了一点(1% 的 1/20),所以这不是全部.

One explanation is that the the information_schema.tables query returns an unreliable row count which would throw off the avg_row_length. I have checked the row count it is using against a count(*) query and it is only off by a little (1/20 of 1%), so that's not the whole story.

另一种解释是碎片化.值得注意的是,这个表是从一个 sql 转储重建的,所以没有任何更新、插入和删除的锤击.

Another explanation is fragmentation. Of note, this table has been rebuilt from a sql dump, so there hasn't been any hammering of updates, inserts and deletes.

推荐答案

  • 因为avg_row_lengthdata_length/rows.
  • data_length 基本上是磁盘上表的总大小.InnoDB 表不仅仅是行列表.所以有额外的开销.

    data_length is basically the total size of the table on disk. An InnoDB table is more than just a list of rows. So there's that extra overhead.

    • 因为 InnoDB 行不仅仅是数据.

    与上面类似,每一行都有一些开销.所以这将增加一行的大小.InnoDB 表也不仅仅是挤在一起的数据列表.它需要一点额外的空间才能有效地工作.

    Similar to above, each row comes with some overhead. So that's going to add to the size of a row. An InnoDB table also isn't just a list of data crammed together. It needs a little extra empty space to work efficiently.

    • 因为东西以块的形式存储在磁盘上,而这些块并不总是满的.

    磁盘通常以 4K、8K 或 16K 存储内容.有时这些块并不完全适合,因此您可以获得一些空白.

    Disks store things in usually 4K, 8K or 16K blocks. Sometimes things don't fit perfectly in those blocks, so you can get some empty space.

    正如我们将在下面看到的,MySQL 将按块分配表.并且它会分配比它需要的更多的东西以避免必须增加表(这可能很慢并导致 磁盘碎片,这让事情变得更慢).

    As we'll see below, MySQL is going to allocate the table in blocks. And it's going to allocate a lot more than it needs to avoid having to grow the table (which can be slow and lead to disk fragmentation which makes things even slower).

    为了说明这一点,让我们从一个空表开始.

    To illustrate this, let's start with an empty table.

    mysql> create table foo ( id smallint(5) unsigned NOT NULL );
    mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
    +-------------+------------+----------------+
    | data_length | table_rows | avg_row_length |
    +-------------+------------+----------------+
    |       16384 |          0 |              0 |
    +-------------+------------+----------------+
    

    它使用 16K 或四个 4K 块来存储任何内容.空表不需要这个空间,但 MySQL 分配了它,假设你要在其中放置一堆数据.这避免了必须对每个插入进行昂贵的重新分配.

    It uses 16K, or four 4K blocks, to store nothing. The empty table doesn't need this space, but MySQL allocated it on the assumption that you're going to put a bunch of data in it. This avoids having to do an expensive reallocation on each insert.

    现在让我们添加一行.

    mysql> insert into foo (id) VALUES (1);
    mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
    +-------------+------------+----------------+
    | data_length | table_rows | avg_row_length |
    +-------------+------------+----------------+
    |       16384 |          1 |          16384 |
    +-------------+------------+----------------+
    

    桌子没有变大,它有 4 个块内所有未使用的空间.有一行意味着 avg_row_length 为 16K.显然很荒谬.让我们再添加一行.

    The table didn't get any bigger, there's all that unused space within those 4 blocks it has. There's one row which means an avg_row_length of 16K. Clearly absurd. Let's add another row.

    mysql> insert into foo (id) VALUES (1);
    mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
    +-------------+------------+----------------+
    | data_length | table_rows | avg_row_length |
    +-------------+------------+----------------+
    |       16384 |          2 |           8192 |
    +-------------+------------+----------------+
    

    同样的事情.为表分配了 16K,2 行使用该空间.每行 8K 的荒谬结果.

    Same thing. 16K is allocated for the table, 2 rows using that space. An absurd result of 8K per row.

    随着我插入越来越多的行,表大小保持不变,它使用的分配空间越来越多,avg_row_length 更接近现实.

    As I insert more and more rows, the table size stays the same, it's using up more and more of its allocated space, and the avg_row_length comes closer to reality.

    mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';                                                                     
    +-------------+------------+----------------+
    | data_length | table_rows | avg_row_length |
    +-------------+------------+----------------+
    |       16384 |       2047 |              8 |
    +-------------+------------+----------------+
    

    这里我们也开始看到 table_rows 变得不准确.我肯定插入了 2048 行.

    Here also we start to see table_rows become inaccurate. I definitely inserted 2048 rows.

    现在当我插入更多...

    Now when I insert some more...

    mysql> select data_length, table_rows, avg_row_length from information_schema.tables where table_name = 'foo';
    +-------------+------------+----------------+
    | data_length | table_rows | avg_row_length |
    +-------------+------------+----------------+
    |       98304 |       2560 |             38 |
    +-------------+------------+----------------+
    

    (我插入了 512 行,table_rows 由于某种原因已经恢复到现实)

    (I inserted 512 rows, and table_rows has snapped back to reality for some reason)

    MySQL 认为该表需要更多空间,因此它调整了大小并占用了更多磁盘空间.avg_row_length 又跳了一次.

    MySQL decided the table needs more space, so it got resized and grabbed a bunch more disk space. avg_row_length just jumped again.

    它占用的空间比这 512 行所需的多得多,现在是 96K 或 24 个 4K 块,假设以后会需要它.这最大限度地减少了它需要执行的潜在缓慢重新分配的数量,并最大限度地减少磁盘碎片.

    It grabbed a lot more space than it needs for those 512 rows, now it's 96K or 24 4K blocks, on the assumption that it will need it later. This minimizes how many potentially slow reallocations it needs to do and minimizes disk fragmentation.

    这并不意味着所有空间都已填满.这只是意味着 MySQL 认为它已经足够需要更多空间来有效运行.如果您想知道为什么会这样,请查看 哈希表 的运作方式.我不知道 InnoDB 是否使用哈希表,但原则适用:一些数据结构在有一些空白时运行最好.

    This doesn't mean all that space was filled. It just means MySQL thought it was full enough to need more space to run efficiently. If you want an idea why that's so, look into how a hash table operates. I don't know if InnoDB uses a hash table, but the principle applies: some data structures operate best when there's some empty space.

    一个表使用的磁盘与表中的行数和列的类型直接相关,但确切的公式很难计算,并且会随着 MySQL 的版本而变化.最好的办法是做一些实证测试,然后辞职,因为你永远不会得到一个确切的数字.

    The disk used by a table is directly related to the number of rows and types of columns in the table, but the exact formula is difficult to figure out and will change from version to version of MySQL. Your best bet is to do some empirical testing and resign yourself that you'll never get an exact number.

    这篇关于平均行长度高于可能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆