如何在子查询的结果上使用正则表达式? [英] How to use regexp on the results of a sub query?

查看:128
本文介绍了如何在子查询的结果上使用正则表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个桌子.

用户 其中具有 id 电话号码

id phone_no

1 ---- 9912678

1 ---- 9912678

2 ---- 9912323

2 ---- 9912323

3 ---- 9912366

3 ---- 9912366

入场表,其中具有 id 电话号码

id 电话号码

6 --- 991267823

6 --- 991267823

7 --- 991236621

7 --- 991236621

8 --- 435443455

8 --- 435443455

9 --- 243344333

9 --- 243344333

我想找到准入表中所有与用户表和更新电话号码 >在用户表中.

I want to find all the phone number of Admission's table which has same pattern as users table and update it in users table.

所以我正在尝试

select phone_no  from admission where phone_no REGEXP (SELECT phone_no
FROM  `users` AS user
WHERE user.phone_no REGEXP  '^(99)+[0-9]{8}')

但是我收到此错误子查询返回的行多于1

寻求帮助.

推荐答案

尝试以下查询之一:

SELECT a.phone_no
FROM admission a
JOIN users u on a.phone_no LIKE concat(u.phone_no, '__')
WHERE u.phone_no REGEXP  '^(99)+[0-9]+$'

SELECT a.phone_no
FROM admission a
JOIN users u on a.phone_no REGEXP concat('^', u.phone_no, '[0-9]{2}$')
WHERE u.phone_no REGEXP  '^(99)+[0-9]+$'

如果尾随数字"的数量不确定,您还可以使用:

If the number of "trailing digits" is not fixed, you can also use:

LIKE concat(u.phone_no, '%')

REGEXP concat('^', u.phone_no, '[0-9]*$')

但是在这种情况下,如果users.phone_no可能是另一个users.phone_no的子序列(例如99123和991234),则可能需要使用SELECT DISTICT a.phone_no.

But in this case you might need to use SELECT DISTICT a.phone_no if it is possible that a users.phone_no is a subsequence of an other users.phone_no (e.g. 99123 and 991234).

更新

运行了一些测试后,用户表有10K行,准入表有100K行,我来到了以下查询:

After running some tests with 10K rows for users table and 100K rows for admission table i came to the following query:

SELECT a.phone_no
FROM admission a
JOIN users u 
    ON  a.phone_no >= u.phone_no
    AND a.phone_no < CONCAT(u.phone_no, 'z')
    AND a.phone_no LIKE CONCAT(u.phone_no, '%')
    AND a.phone_no REGEXP CONCAT('^', u.phone_no, '[0-9]*$')
WHERE   u.phone_no LIKE  '99%'
    AND u.phone_no REGEXP  '^(99)+[0-9]*$'
UNION SELECT 0 FROM (SELECT 0) dummy WHERE 0

小提琴

这样,您可以使用REGEXP并仍然具有出色的性能.在我的测试案例中,该查询几乎立即执行.

This way you can use REGEXP and still have great performance. This query executes almost instantly in my test case.

从逻辑上讲,您仅需要REGEXP条件.但是在较大的表上,查询可能会超时.使用LIKE条件将在REGEXP检查之前过滤结果集.但是,即使使用LIKE,查询也无法很好地执行.由于某种原因,MySQL不对联接使用范围检查.所以我添加了一个明确的范围检查:

Logically you only need the REGEXP conditions. But on bigger tables the query might time out. Using a LIKE condition will filter the result set before REGEXP check. But even using LIKE the query doesn't perform very well. For some reason MySQL doesn't use a range check for the join. So i added an explicit range check:

    ON  a.phone_no >= u.phone_no
    AND a.phone_no < CONCAT(u.phone_no, 'z')

通过此检查,您可以从JOIN部分中删除LIKE条件.

With this check you can remove the LIKE condition from the JOIN part.

UNION部件代替了DISTICT. MySQL似乎将DISTINCT转换为GROUP BY语句,该语句表现不佳.使用带有空结果集的UNION,我强制MySQL在SELECT之后删除重复项.如果您使用固定数量的尾随数字,则可以删除该行.

The UNION part is a replacement for DISTICT. MySQL seems to translate DISTINCT into a GROUP BY statement, which doesn't perform well. Using UNION with an empty result set i force MySQL to remove duplicates after the SELECT. You can remove that line, if you use a fixed number of trailing digits.

您可以根据需要调整REGEXP模式:

You can adjust the REGEXP patterns to your needs:

...
    AND a.phone_no REGEXP CONCAT('^', u.phone_no, '[0-9]{2}$')
...
    AND u.phone_no REGEXP  '^(99)+[0-9]{8}$'
...

如果您只需要REGEXP来检查phone_no的长度,则还可以将LIKE条件与'_'占位符一起使用.

If you only need REGEXP to check the length of the phone_no, you can also use a LIKE condition with the '_' placeholder.

    AND a.phone_no LIKE CONCAT(u.phone_no, '__')
...
    AND u.phone_no LIKE '99________$'

或将LIKE条件与STR_LENGTH检查结合起来.

or combine a LIKE condition with a STR_LENGTH check.

这篇关于如何在子查询的结果上使用正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆