MySQL:优化查询以从一组字符串中查找匹配的字符串 [英] MySQL: Optimized query to find matching strings from set of strings

查看:96
本文介绍了MySQL:优化查询以从一组字符串中查找匹配的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 10 组琴弦,每组有 9 根琴弦.在这 10 组中,第一组中的所有字符串的长度为 10,第二组中的所有字符串的长度为 9,依此类推.最后,第 10 组中的所有字符串的长度均为 1.

I am having 10 sets of strings each set having 9 strings. Of this 10 sets, all strings in first set have length 10, those in second set have length 9 and so on. Finally, all strings in 10th set have length 1.

每个集合中有(长度-2)个字符的公共前缀.并且前缀长度在下一组中减少1.因此,第一组共有 8 个字符,第二组共有 7 个,依此类推.

There is common prefix of (length-2) characters in each set. And the prefix length reduces by 1 in next set. Thus, first set has 8 characters in common, second has 7 and so on.

以下是 10 个集合的示例:

Here is what a sample of 10 sets look like:

pu3q0k0vwn
pu3q0k0vwp
pu3q0k0vwr
pu3q0k0vwq
pu3q0k0vwm
pu3q0k0vwj
pu3q0k0vtv
pu3q0k0vty
pu3q0k0vtz

pu3q0k0vw
pu3q0k0vy
pu3q0k0vz
pu3q0k0vx
pu3q0k0vr
pu3q0k0vq
pu3q0k0vm
pu3q0k0vt
pu3q0k0vv

pu3q0k0v
pu3q0k0y
pu3q0k1n
pu3q0k1j
pu3q0k1h
pu3q0k0u
pu3q0k0s
pu3q0k0t
pu3q0k0w

pu3q0k0
pu3q0k2
pu3q0k3
pu3q0k1
pu3q07c
pu3q07b
pu3q05z
pu3q0hp
pu3q0hr

pu3q0k
pu3q0m
pu3q0t
pu3q0s
pu3q0e
pu3q07
pu3q05
pu3q0h
pu3q0j

pu3q0
pu3q2
pu3q3
pu3q1
pu3mc
pu3mb
pu3jz
pu3np
pu3nr

pu3q
pu3r
pu3x
pu3w
pu3t
pu3m
pu3j
pu3n
pu3p

pu3
pu9
pud
pu6
pu4
pu1
pu0
pu2
pu8

pu
pv
0j
0h
05
pg
pe
ps
pt

p
r
2
0
b
z
y
n
q

要求:我有一个表 PROFILES,其中包含列 SRNO(类型 bigint,主键)和 UNIQUESTRING(类型 char(10),唯一键).我想从这 10 个集合中找到 450 个用于匹配 UNIQUESTRING 的 SRNO.

Requirement: I have a table PROFILES having columns SRNO (type bigint, primary key) and UNIQUESTRING (type char(10), unique key). I want to find 450 SRNOs for matching UNIQUESTRINGs from those 10 sets.

首先在第一组中查找like 字符串.如果我们没有得到足够的结果(即 450),请在第二组中查找 like 字符串.如果我们仍然没有得到足够的结果(450 减去 第一组的结果)在第三组中查找字符串 like.等等.

First find strings like in the first set. If we don't get enough results (ie. 450), find strings like in second set. If we still don't get enough results (450 minus results of first set) find strings like in third set. And so on.

现有解决方案:我写过类似的查询:

select srno from  profiles
    where  ( (uniquestring like 'pu3q0k0vwn%')
              or  (uniquestring like 'pu3q0k0vwp%') -- all those above uniquestrings after this and finally the last one
              or  (uniquestring like 'n%')
              or  (uniquestring like 'q%')
           )
    limit  450

但是,在此 answer 中获得 Rick James 的反馈后,我意识到这不是优化的查询,因为它涉及很多行比它需要.所以我打算像这样重写查询:

However, after getting feedback from Rick James in this answer I realized this is not optimized query as it touches lot many rows than it needs. So I plan to rewrite the query like this:

(select srno from  profiles where uniquestring like 'pu3q0k0vwn%' LIMIT 450)
UNION DISTINCT
(select srno from  profiles where uniquestring like 'pu3q0k0vwp%' LIMIT 450); -- and more such clauses after this for each uniquestring 

我想知道是否有更好的解决方案来做到这一点.

I like to know if there are any better solutions to do this.

推荐答案

SELECT ...
    WHERE str   LIKE  'pu3q0k0vw%' AND -- the 10-char set
          str REGEXP '^pu3q0k0vw[nprqmj]'  -- the 9 next letters
    LIMIT ...
# then check for 450; if not enough, continue...
SELECT ...
    WHERE str   LIKE  'pu3q0k0vt%' AND -- the 10-char set
          str REGEXP '^pu3q0k0vt[vyz]'  -- the 9 next letters
    LIMIT 450
# then check for 450; if not enough, continue...
etc.
SELECT ...
    WHERE str   LIKE  'pu3q0k0v%' AND -- the 9-char set
          str REGEXP '^pu3q0k0v[wyzxrqmtv]'  -- the 9 next letters
    LIMIT ...
# check, etc; for a total of 10 SELECTs or 450 rows, whichever comes first.

这将是 10 多个选择.每个选择都将通过首先使用 LIKE 选择具有公共前缀的行进行某种程度的优化,然后使用 REGEXP 进行双重检查.

This will be 10+ selects. Each select will be somewhat optimized by first picking rows with a common prefix with LIKE, then it double checks with a REGEXP.

(如果您不喜欢将不一致的 pu3q0k0vwpu3q0k0vt 分开;我们可以进一步讨论.)

(If you don't like splitting the inconsistent pu3q0k0vw vs. pu3q0k0vt; we can discuss things further.)

你说前缀";我已经对 LIKE 和 REGEXP 进行了编码,以在给定的前缀之后假设任意文本.

You say "prefix"; I have coded the LIKE and REGEXP to assume arbitrary text after the prefix given.

UNION 不可行,因为它(我认为)会在选择 450 之前收集所有行.每个 SELECT 将在 LIMIT 处停止> if 没有 DISTINCT GROUP BYORDER BY 需要先收集所有内容.

UNION is not viable, since it will (I think) gather all the rows before picking 450. Each SELECT will stop at the LIMIT if there is no DISTINCT GROUP BY or ORDER BY that require gathering everything first.

REGEXP 不够智能,无法避免扫描整个表;添加 LIKE 可以避免这种情况(除非超过 20% 的行与 LIKE 匹配).

REGEXP is not smart enough to avoid scanning the entire table; adding the LIKE avoids such (except when more than, say, 20% of the rows match the LIKE).

这篇关于MySQL:优化查询以从一组字符串中查找匹配的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆