COUNT和GROUP BY在文本字段上似乎很慢 [英] COUNT and GROUP BY on text fields seems slow

查看:458
本文介绍了COUNT和GROUP BY在文本字段上似乎很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在建立一个MySQL数据库,其中包含关于酵母物种中的特殊子串的条目。我的表格如下:

  + -------------- + ---- ----- + ------ + ----- + --------- + ------- + 
|字段|类型|空| Key |默认|额外|
+ -------------- + --------- + ------ + ----- + ------- - + ------- +
|物种|文本| YES | MUL | NULL | |
|区域|文本| YES | MUL | NULL | |
|基因|文本| YES | MUL | NULL | |
| startPos | int(11)| YES | | NULL | |
| repeatLength | int(11)| YES | | NULL | |
| coreLength | int(11)| YES | | NULL | |
|序列|文本| YES | MUL | NULL | |
+ -------------- + --------- + ------ + ----- + ------- - + ------- +

大约有180万条记录。在一种类型的查询中,我想查看有多少个DNA子字符串与每种类型的物种和区域相关联,因此我发出以下查询:

  select species,region,count(*)group by species,region; 

物种和区域列只有两个可能的条目(物种的保守/对于区域),但此查询需要大约 30秒



这是一个正常的时间量,的表?是慢,因为我使用文本字段,而不是简单的整数或布尔值(我更喜欢文本字段,因为几个非CS研究人员将使用DB)。欢迎任何其他想法和建议。



请原谅这是一个骨头问题,我是一个SQL新手。



p PS我也看过这个问题将这些字段转换为VARCHARs会将这些字段转换为VARCHAR类型,从而减少了运行时间〜2.5秒。

解决方案

为什么所有基于字符串的列都定义为TEXT?如果你读取性能比较,你会看到TEXT比使用相同的索引的VARCHAR列慢了〜3x: http://forums.mysql.com/read.php?24,105964,105964


I'm building a MySQL database which contains entries about special substrings of DNA in species of yeast. My table looks like this:

+--------------+---------+------+-----+---------+-------+
| Field        | Type    | Null | Key | Default | Extra |
+--------------+---------+------+-----+---------+-------+
| species      | text    | YES  | MUL | NULL    |       |
| region       | text    | YES  | MUL | NULL    |       |
| gene         | text    | YES  | MUL | NULL    |       |
| startPos     | int(11) | YES  |     | NULL    |       |
| repeatLength | int(11) | YES  |     | NULL    |       |
| coreLength   | int(11) | YES  |     | NULL    |       |
| sequence     | text    | YES  | MUL | NULL    |       |
+--------------+---------+------+-----+---------+-------+

There are approximately 1.8 million records. In one type of query I want to see how many DNA substrings are associated with each type of species and region, so I issue this query:

select species, region, count(*) group by species, region;

The species and region columns have only two possible entries (conserved/scer for species, and promoter/coding for region) yet this query takes about 30 seconds.

Is this a normal amount of time to expect for this type of query given the size of the table? Is it slow because I'm using text fields instead of simple integer or boolean values (I prefer text fields as several non-CS researchers will be using the DB). Any other ideas and suggestions would be welcome.

Please excuse if this is a boneheaded question, I am an SQL neophyte.

P.S. I've also seen this question but the proposed solution doesn't seem relevant for what I'm doing.

EDIT: Converting those fields to VARCHARs reduced the runtime to ~2.5 seconds. Note I also timed it against ENUMs which had a similar timing.

解决方案

Why're all your string based columns defined as TEXT? If you read the performance comparison, you'll see that TEXT was ~3x slower than a VARCHAR column using identical indexing: http://forums.mysql.com/read.php?24,105964,105964

这篇关于COUNT和GROUP BY在文本字段上似乎很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆