MySQL:优化查询以从一组字符串中查找匹配的字符串 [英] MySQL: Optimized query to find matching strings from set of strings
问题描述
我有 10 组琴弦,每组有 9 根琴弦.在这 10 组中,第一组中的所有字符串的长度为 10,第二组中的所有字符串的长度为 9,依此类推.最后,第 10 组中的所有字符串的长度均为 1.
I am having 10 sets of strings each set having 9 strings. Of this 10 sets, all strings in first set have length 10, those in second set have length 9 and so on. Finally, all strings in 10th set have length 1.
每个集合中有(长度-2)个字符的公共前缀.并且前缀长度在下一组中减少1.因此,第一组共有 8 个字符,第二组共有 7 个,依此类推.
There is common prefix of (length-2) characters in each set. And the prefix length reduces by 1 in next set. Thus, first set has 8 characters in common, second has 7 and so on.
以下是 10 个集合的示例:
Here is what a sample of 10 sets look like:
pu3q0k0vwn
pu3q0k0vwp
pu3q0k0vwr
pu3q0k0vwq
pu3q0k0vwm
pu3q0k0vwj
pu3q0k0vtv
pu3q0k0vty
pu3q0k0vtz
pu3q0k0vw
pu3q0k0vy
pu3q0k0vz
pu3q0k0vx
pu3q0k0vr
pu3q0k0vq
pu3q0k0vm
pu3q0k0vt
pu3q0k0vv
pu3q0k0v
pu3q0k0y
pu3q0k1n
pu3q0k1j
pu3q0k1h
pu3q0k0u
pu3q0k0s
pu3q0k0t
pu3q0k0w
pu3q0k0
pu3q0k2
pu3q0k3
pu3q0k1
pu3q07c
pu3q07b
pu3q05z
pu3q0hp
pu3q0hr
pu3q0k
pu3q0m
pu3q0t
pu3q0s
pu3q0e
pu3q07
pu3q05
pu3q0h
pu3q0j
pu3q0
pu3q2
pu3q3
pu3q1
pu3mc
pu3mb
pu3jz
pu3np
pu3nr
pu3q
pu3r
pu3x
pu3w
pu3t
pu3m
pu3j
pu3n
pu3p
pu3
pu9
pud
pu6
pu4
pu1
pu0
pu2
pu8
pu
pv
0j
0h
05
pg
pe
ps
pt
p
r
2
0
b
z
y
n
q
要求:我有一个表 PROFILES,其中包含列 SRNO(类型 bigint,主键)和 UNIQUESTRING(类型 char(10),唯一键).我想从这 10 个集合中找到 450 个用于匹配 UNIQUESTRING 的 SRNO.
Requirement: I have a table PROFILES having columns SRNO (type bigint, primary key) and UNIQUESTRING (type char(10), unique key). I want to find 450 SRNOs for matching UNIQUESTRINGs from those 10 sets.
首先在第一组中查找like 字符串.如果我们没有得到足够的结果(即 450),请在第二组中查找 like 字符串.如果我们仍然没有得到足够的结果(450 减去 第一组的结果)在第三组中查找字符串 like.等等.
First find strings like in the first set. If we don't get enough results (ie. 450), find strings like in second set. If we still don't get enough results (450 minus results of first set) find strings like in third set. And so on.
现有解决方案:我写过类似的查询:
select srno from profiles
where ( (uniquestring like 'pu3q0k0vwn%')
or (uniquestring like 'pu3q0k0vwp%') -- all those above uniquestrings after this and finally the last one
or (uniquestring like 'n%')
or (uniquestring like 'q%')
)
limit 450
但是,在此 answer 中获得 Rick James 的反馈后,我意识到这不是优化的查询,因为它涉及很多行比它需要.所以我打算像这样重写查询:
However, after getting feedback from Rick James in this answer I realized this is not optimized query as it touches lot many rows than it needs. So I plan to rewrite the query like this:
(select srno from profiles where uniquestring like 'pu3q0k0vwn%' LIMIT 450)
UNION DISTINCT
(select srno from profiles where uniquestring like 'pu3q0k0vwp%' LIMIT 450); -- and more such clauses after this for each uniquestring
我想知道是否有更好的解决方案来做到这一点.
I like to know if there are any better solutions to do this.
推荐答案
SELECT ...
WHERE str LIKE 'pu3q0k0vw%' AND -- the 10-char set
str REGEXP '^pu3q0k0vw[nprqmj]' -- the 9 next letters
LIMIT ...
# then check for 450; if not enough, continue...
SELECT ...
WHERE str LIKE 'pu3q0k0vt%' AND -- the 10-char set
str REGEXP '^pu3q0k0vt[vyz]' -- the 9 next letters
LIMIT 450
# then check for 450; if not enough, continue...
etc.
SELECT ...
WHERE str LIKE 'pu3q0k0v%' AND -- the 9-char set
str REGEXP '^pu3q0k0v[wyzxrqmtv]' -- the 9 next letters
LIMIT ...
# check, etc; for a total of 10 SELECTs or 450 rows, whichever comes first.
这将是 10 多个选择.每个选择都将通过首先使用 LIKE 选择具有公共前缀的行进行某种程度的优化,然后使用 REGEXP 进行双重检查.
This will be 10+ selects. Each select will be somewhat optimized by first picking rows with a common prefix with LIKE, then it double checks with a REGEXP.
(如果您不喜欢将不一致的 pu3q0k0vw
与 pu3q0k0vt
分开;我们可以进一步讨论.)
(If you don't like splitting the inconsistent pu3q0k0vw
vs. pu3q0k0vt
; we can discuss things further.)
你说前缀";我已经对 LIKE 和 REGEXP 进行了编码,以在给定的前缀之后假设任意文本.
You say "prefix"; I have coded the LIKE and REGEXP to assume arbitrary text after the prefix given.
UNION
不可行,因为它(我认为)会在选择 450 之前收集所有行.每个 SELECT
将在 LIMIT
处停止> if 没有 DISTINCT
GROUP BY
或 ORDER BY
需要先收集所有内容.
UNION
is not viable, since it will (I think) gather all the rows before picking 450. Each SELECT
will stop at the LIMIT
if there is no DISTINCT
GROUP BY
or ORDER BY
that require gathering everything first.
REGEXP
不够智能,无法避免扫描整个表;添加 LIKE
可以避免这种情况(除非超过 20% 的行与 LIKE
匹配).
REGEXP
is not smart enough to avoid scanning the entire table; adding the LIKE
avoids such (except when more than, say, 20% of the rows match the LIKE
).
这篇关于MySQL:优化查询以从一组字符串中查找匹配的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!