连接表时需要更快的FIND_IN_SET替代方法 [英] need faster alternative for FIND_IN_SET when Joining tables

查看:1127
本文介绍了连接表时需要更快的FIND_IN_SET替代方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果要在TABLE2逗号分隔列上找到TABLE1 ID,我想执行选择.

I want to perform select if TABLE1 ID is found on TABLE2 comma separated column.

table1:
+---------+
| id      |
+---------+
|12345678 |
+---------+
|12322222 |
+---------+

table2:
+---------------------------------------------+
| manyids                                     |
+---------------------------------------------+
|12345678,1111111,2222233,0000111,65321       |
+---------------------------------------------+
|2222233,12322222                             |
+---------------------------------------------+
|12322222                                     |
+---------------------------------------------+

这在较小的测试表上工作正常:

This is working fine on smaller test table:

SELECT table1.id,
COUNT(table1.id) AS occurences
FROM table1 JOIN table2 ON FIND_IN_SET(table1.id, table2.manyids ) > 0
GROUP BY table1.id HAVING occurences > 0 
ORDER BY occurences DESC

但是我要执行选择的实际TABLE1有超过500k行,并且FIND_IN_SET太慢了.有其他选择吗?

However actual TABLE1 I want to perform select has over 500k rows and FIND_IN_SET is just too slow. Any alternatives?

推荐答案

唯一可行的明智替代方案是对表格进行规范化

The only sensible alternative will be to normalize the tables:

示例

table tag
---------
id integer auto_increment primary key
name varchar(40)

table article
-------------
id integer auto_increment primary key
title varchar(1000)
content text

table tag_link
--------------
article_id integer foreign key references article(id)
tag_id integer foreign key references tag(id)
primary key article_id, tag_id

由于所有字段均已建立索引,因此您可以像这样轻松而非常快速地查询:

Because all the fields are indexed, you can query easily and very very fast like so:

SELECT t.name FROM article AS a
INNER JOIN tag_link tl ON (tl.article_id = a.id)
INNER JOIN tag t ON (t.id = tl.tag_id)
WHERE a.id = '45785'

选项2不好的主意,比选项1糟糕得多
如果您确实无法更改设置,请在字段manyids上创建一个fulltext索引.

Option 2 bad idea, much worse than option 1
If you really really cannot change the setup, create a fulltext index on field manyids.

并将查询更改为:

SELECT table1.id,
COUNT(table1.id) AS occurences
FROM table1 
JOIN table2 ON MATCH(table2.manyids) 
               AGAINST (CONCAT("+'",table1.id,"'") IN BOOLEAN MODE)  
/*boolean mode is required*/
GROUP BY table1.id HAVING occurences > 0 
ORDER BY occurences DESC

如果停用词列表中有任何ID,则将不匹配.请注意,此列表中没有数字.

If any id is in the stopword list it will not match. Note that there are no numbers in this list.

链接
http://dev.mysql.com/doc/refman/5.5/en/fulltext- stopwords.html
http://dev.mysql.com/doc/refman/5.5/en/fulltext- boolean.html

Links
http://dev.mysql.com/doc/refman/5.5/en/fulltext-stopwords.html
http://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html

请注意,您将需要调整全文索引考虑的最小和最大单词长度:请参见:http://dev.mysql.com/doc/refman/5.5/en/fulltext-fine-tuning.html

Note that you will need to tweak the min and max word length that the full-text index takes into account: see: http://dev.mysql.com/doc/refman/5.5/en/fulltext-fine-tuning.html

这篇关于连接表时需要更快的FIND_IN_SET替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆