MySQL IN子句使用子选择与值列表 [英] MySQL IN clause using sub-select versus list of values

查看:96
本文介绍了MySQL IN子句使用子选择与值列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要对一些相关数据进行两个级别的过滤.第一个查询类似于:

SELECT t1.fk_id 
FROM t1 
LEFT JOIN t3 ON t3.fk_id = t1.fk_id
WHERE t1.field1 > 10 AND t3.field2 = Y

第二个查询针对具有相同fk_id字段的另一个表运行,看起来像

SELECT t2.fk_id, SUM(t2.field3) AS sum_3, SUM(t2.field_4) AS sum_4 
FROM t2 
WHERE fk_id IN (fk_values_from_query_1)
GROUP BY t2.fk_id
HAVING sum_3 > 1000

现在,据我所知,我可以运行这两种不同的方法-尽管我不受任何一种方法的束缚,但也不受其他方法的束缚.我可以将第一个查询作为SUB-SELECT嵌入到第二个查询中,从性能的角度来看,我认为这确实很糟糕.或者,我可以从查询1的结果中提取值,并将它们作为列表嵌入到查询2中(在我的应用程序代码中).

这个问题的两个部分是:

  1. 在性能方面,上述两种查询结构之间是否有区别?
  2. 是否有更好的方法来构造这两个查询?

基准

我没有对此进行全面测试,但是针对我的数据运行了我的版本以及Barmar发布的版本.我的查询运行了大约4.23秒,而Barmar的版本仅运行了0.60秒.这是85%的改善!

解决方案

您应该使用JOIN将它们组合在一起:

SELECT t2.fk_id, SUM(t2.field3) AS sum_3, SUM(t2.field_4) AS sum_4 
FROM t2
JOIN (SELECT distinct t1.fk_id
      FROM t1
      JOIN t2 ON t3.fk_id = t1.fk_id
      WHERE t1.field1 > 10 AND t3.field2 = 'Y') t4
ON t2.fk_id = t4.fk_id
GROUP BY t2.fk_id
HAVING sum_3 > 1000

我一直发现,与类似的连接相比,MySQL对WHERE col IN (subquery)查询的性能令人震惊.我没有将它与替换子查询中的值的查询进行比较,因为我仅在不可能在单个查询中做到这一点时才这样做(例如,我需要在不同服务器上匹配数据)./p>

顺便说一句,如果您还要过滤与之联接的表中的值,则使用LEFT JOIN没有意义.

在所有情况下,请确保您在join或IN子句中使用的键上具有索引.

I have two levels of filtering I need to do on some related data. The first query looks something like:

SELECT t1.fk_id 
FROM t1 
LEFT JOIN t3 ON t3.fk_id = t1.fk_id
WHERE t1.field1 > 10 AND t3.field2 = Y

The second query runs against another table with the same fk_id field, and looks something like

SELECT t2.fk_id, SUM(t2.field3) AS sum_3, SUM(t2.field_4) AS sum_4 
FROM t2 
WHERE fk_id IN (fk_values_from_query_1)
GROUP BY t2.fk_id
HAVING sum_3 > 1000

Now, I can run this 2 different ways, from what I can tell - though I'm not tied to either method, any open to other methods as well. I could either embed the first query into the second query as a SUB-SELECT, which I understand to be really bad from a performance perspective. Or, I could extract the values from the results of query 1, and embed them as a list in query 2 (in my application code).

The two parts to this question are:

  1. Is there any difference, performance wise, between the 2 query structures described above?
  2. Is there a better way to structure these 2 queries?

Benchmarks

I didn't fully test this, but ran my version, and the version posted by Barmar, against my data. My query was running in approximately 4.23 seconds, while Barmar's version took only 0.60 seconds to run. That's an 85% improvement!

解决方案

You should combine them using a JOIN:

SELECT t2.fk_id, SUM(t2.field3) AS sum_3, SUM(t2.field_4) AS sum_4 
FROM t2
JOIN (SELECT distinct t1.fk_id
      FROM t1
      JOIN t2 ON t3.fk_id = t1.fk_id
      WHERE t1.field1 > 10 AND t3.field2 = 'Y') t4
ON t2.fk_id = t4.fk_id
GROUP BY t2.fk_id
HAVING sum_3 > 1000

I've consistently found that MySQL performs horribly on WHERE col IN (subquery) queries, compared to the analogous join. I haven't compared it with queries where I substitute the values from the subquery, because I've only done that when it wasn't possible to do it in a single query (e.g. I need to match data on different servers).

BTW, there's no point in using a LEFT JOIN if you're also filtering on values in the table being joined with.

In all cases, make sure that you have indexes on the keys used in the join or IN clause.

这篇关于MySQL IN子句使用子选择与值列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆