MySQL的慢与表中的大文本字段 [英] MySQL slow with large text fields in table

查看:156
本文介绍了MySQL的慢与表中的大文本字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在MySQL(以及MariaDB)方面遇到了一个奇怪的问题.一个简单的数据库,带有2个表(InnoDB引擎),两个表都(除其他几个之外)包含3或4个带有XML数据的文本列.大小为1-5kB. 每个表大约有40000行,除了外键之外没有索引.

We're having a weird problem with MySQL (and also MariaDB). A simple database with 2 tables (InnoDB engine), both containing (among a few others) 3 or 4 text columns with XML data approx. 1-5kB in size. Each table has around 40000 rows and no indexes except those for foreign keys.

奇怪的部分正在运行选择语句. XML列未在select语句内的任何位置(select,where,order,group,...)使用,但它们减慢了执行速度.如果这些列为空,则select语句将在不到2秒的时间内执行,但是如果它们包含数据,则执行时间将跳至20秒左右.为什么呢?!

The weird part is running select statements. The XML columns are NOT used anywhere inside select statement (select, where, order, group, ...), yet they slow down execution. If those columns are null, select statement executes in less than 2 second, but if they contain data, execution time jumps to around 20 seconds. Why is that?!

这是一个脚本,可生成上述行为的示例:

This is a script that generates an example behaving like described above:

CREATE TABLE tableA (
    id                  bigint(20)      NOT NULL AUTO_INCREMENT,
    col1                bigint(20)      NULL,
    col2                bigint(20)      NULL,
    date1               datetime        NULL,
    largeString1        text            NULL,
    largeString2        text            NULL,
    largeString3        text            NULL,
    largeString4        text            NULL,
    PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;

CREATE TABLE tableB (
    id              bigint(20)  NOT NULL AUTO_INCREMENT,
    col1            bigint(20)  NULL,
    col2            varchar(45) NULL,
    largeString1    text        NULL,
    largeString2    datetime    NULL,
    largeString3    text        NULL,
    PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;

fillTables:

fillTables:

DELIMITER ;;
CREATE PROCEDURE `fillTables`(
    numRows INT
)
BEGIN

    DECLARE i INT;
    DECLARE j INT;
    DECLARE largeString TEXT;
    SET i = 1;

    START TRANSACTION;

    WHILE i < numRows DO
        SET j = 1;
        SET largeString = '';
        WHILE j <= 100 DO
            SET largeString = CONCAT(largeString, (SELECT UUID()));
            SET j = j + 1;
        END WHILE;

        INSERT INTO tableA (id, col1, col2, date1, largeString1,
                           largeString2, largeString3, largeString4)
           VALUES (i, FLOOR(1 + RAND() * 2), numRows - i, 
                   date_sub(now(), INTERVAL i hour),
                   largeString, largeString, largeString, largeString);
        INSERT INTO tableB (id, col1, col2, largeString1,
                           largeString2, largeString3)
           VALUES (numRows - i, i, (SELECT UUID()),
                   largeString, largeString, largeString);
        SET i = i + 1;
    END WHILE;

    COMMIT;

    ALTER TABLE tableA ADD FOREIGN KEY (col2) REFERENCES tableB(id);
    CREATE INDEX idx_FK_tableA_tableB ON tableA(col2);
    ALTER TABLE tableB ADD FOREIGN KEY (col1) REFERENCES tableA(id);
    CREATE INDEX idx_FK_tableB_tableA ON tableB(col1);

END ;;

测试

CREATE PROCEDURE `test`(
    _param1     bigint
    ,_dateFrom  datetime
    ,_dateTo    datetime
)
BEGIN

    SELECT 
        a.id
        ,DATE(a.date1) as date
        ,COALESCE(b2.col2, '') as guid
        ,COUNT(*) as count
    FROM
        tableA a
        LEFT JOIN tableB b1 ON b1.col1 = a.id
        LEFT JOIN tableB b2 ON b2.id = a.col2
    WHERE
        a.col1 = _param1
        AND (_dateFrom IS NULL OR DATE(a.date1) BETWEEN DATE(_dateFrom) AND DATE(_dateTo))
    GROUP BY
        a.id
        ,DATE(a.date1)
        ,b2.col2
    ;

END;;
DELIMITER ;

要使用随机数据填充表格

To populate the tables with random data use

call fillTables(40000);

用于检索数据的存储过程:

Stored procedure used for retrieving data:

call test(2, null, null);

此外,MSSQL可以在不到一秒的时间内执行select语句,而无需进行任何表优化(即使没有定义外键).

Also, MSSQL executes the select statement in a fraction of a second without any table optimization (even without foreign keys defined).

更新:

两个表的SHOW CREATE TABLE:

SHOW CREATE TABLE for both tables:

'CREATE TABLE `tableA` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `col1` bigint(20) DEFAULT NULL,
  `col2` bigint(20) DEFAULT NULL,
  `date1` datetime DEFAULT NULL,
  `largeString1` text,
  `largeString2` text,
  `largeString3` text,
  `largeString4` text,
  PRIMARY KEY (`id`),
  KEY `idx_FK_tableA_tableB` (`col2`),
  CONSTRAINT `tableA_ibfk_1` FOREIGN KEY (`col2`) REFERENCES `tableB` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=40000 DEFAULT CHARSET=utf8'


'CREATE TABLE `tableB` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `col1` bigint(20) DEFAULT NULL,
  `col2` varchar(45) DEFAULT NULL,
  `largeString1` text,
  `largeString2` datetime DEFAULT NULL,
  `largeString3` text,
  PRIMARY KEY (`id`),
  KEY `idx_FK_tableB_tableA` (`col1`),
  CONSTRAINT `tableB_ibfk_1` FOREIGN KEY (`col1`) REFERENCES `tableA` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=40000 DEFAULT CHARSET=utf8'

推荐答案

两个表都需要INDEX(col1).没有它,这些将需要进行表扫描:

Both tables need INDEX(col1). Without it, these need table scans:

WHERE a.col1 = _param1

ON b1.col1 = a.id

对于a,这将是覆盖"的,因此速度更快:

For a this would be 'covering', hence faster:

INDEX(col1, date1, id, col2)

除非需要,请不要使用LEFT.

Don't use LEFT unless you need it.

尽量不要在函数中隐藏列;它会阻止使用索引:

Try not to hide columns in functions; it prevents using indexes for them:

DATE(a.date1) BETWEEN ...

这可能适用于此:

    a.date1 >= DATE(_dateFrom)
AND a.date1  < DATE(_dateTo) + INTERVAL 1 DAY

关于20s与2s的奥秘-您是否两次运行了每个计时测试?第一次经常被I/O所困扰.第二个是内存受限的.

As for the mystery of 20s vs 2s -- Did you run each timing test twice? The first time is often bogged down with I/O; the second is memory-bound.

ROW_FORMAT

在InnoDB中有4个ROW_FORMATs;它们在处理大字符串(TEXTBLOB等)的方式上主要不同.您提到使用NULL字符串查询比使用非null字符串查询运行得更快.使用默认的ROW_FORMAT,一些或所有XML字符串与其余的列一起存储.经过一定的限制后,其余的将放置在另一个块中.

In InnoDB there are 4 ROW_FORMATs; they mostly differ in how they handle big strings (TEXT, BLOB, etc). You mentioned that the query ran faster with NULL strings than with non-null strings. With the default ROW_FORMAT, some or all of the XML strings is stored with the rest of the columns. After some limit, the rest is put in another block(s).

如果大字段是NULL,则几乎不占用空间.

If a large field is NULL, then it takes almost no space.

使用ROW_FORMAT=DYNAMIC(请参阅CREATE TABLEALTER TABLE),非null列将倾向于被推入其他块,而不是使记录的主要部分变大.

With ROW_FORMAT=DYNAMIC (see CREATE TABLE and ALTER TABLE), a non-null column will tend to be pushed to other blocks instead of making the main part of the record bulky.

这样的效果是允许更多的行适合单个块(溢出除外).这样一来,由于某些查询可以通过更少的I/O来获取更多信息,因此可以使它们更快地运行.

This has the effect of allowing more rows to fit in a single block (except for the overflow). That, in turn, allows certain queries to run faster since they can get more information with fewer I/Os.

阅读文档,我认为您需要这些:

Read the documentation, I think you need these:

SET GLOBAL innodb_file_format=Barracuda;
SET GLOBAL innodb_file_per_table=1;
ALTER TABLE tbl ROW_FORMAT=DYNAMIC;

在阅读文档时,您将遍历COMPRESSED.尽管这可能会使XML缩小3:1,但还有其他问题.我不知道它最终是否会变得更好.

In reading the documentation, you will run across COMPRESSED. Although this would shrink the XML by perhaps 3:1, there are other issues. I don't know whether it would end up being better or not.

缓冲池

innodb_buffer_pool_size应该大约是可用 RAM的70%.

innodb_buffer_pool_size should be about 70% of available RAM.

这篇关于MySQL的慢与表中的大文本字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆