计数查询花费的时间太长-已超过24小时 [英] Count query is taking too long - over 24 hours have passed

查看:131
本文介绍了计数查询花费的时间太长-已超过24小时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,很可能我做错了什么,但是按照用户的建议,我在这里运行此查询:

Ok, most possibly I am doing something wrong, but following the advice of a user here I run this query:

SELECT id, item, 
   (SELECT COUNT(item) FROM Table1 WHERE id=a.id AND item=a.item) cnt
FROM (SELECT DISTINCT a.id,b.item FROM Table1 a, Table1 b) a
ORDER BY id, item;

在此表上:

ID         ITEM
-----------------
0001        345
0001        345
0001        120
0002        567
0002        034
0002        567
0003        567
0004        533
0004        008
...

为了得到这个结果:

ID  ITEM    CNT
1   8       0
1   34      0
1   120     1
1   345     2
1   533     0
1   567     0
2   8       0
2   34      1
...

,但是花费的时间太长,一天后查询仍在运行... 有没有办法提高性能?我大约有400万行

but it is taking too long and the query is still running after a day... Is there a way to improve performance? I have about 4 million rows

谢谢

推荐答案

您的查询非常复杂.我认为您只想计算iditem的组合.如果是这样,这是一个简单的聚合:

Your query is quite convoluted. I think you just want to count the combinations of id and item. If so, this is a simple aggregation:

select id, item, count(*)
from Table1 a
group by id, item;

如果要显示所有ID和项目,请使用驱动程序表:

If you want all ids and items to appear, then use a driver table:

select driver.id, driver.item, coalesce(count(t1.id), 0)
from (select id.id, item.item
      from (select distinct id from Table1) id cross join
           (select distinct item from Table1) item
     ) driver left outer join
     Table1 t1
     on driver.id = t1.id and driver.item = t1.item
group by driver.id, driver.item;

原始查询具有以下语句:

The original query has this statement:

 (SELECT DISTINCT a.id,b.item FROM Table1 a, Table1 b) a

这是在做完整的笛卡尔积,然后做一个不同的.因此,如果您的表有100,000行,则中间表有10,000,000,000行用于不同的数据(我认为MySQL对此没有更好的优化).先进行区分(对于驱动程序而言),可大大减少数据量.

This is doing full cartesian product and then doing a distinct. So, if your table has 100,000 rows, then the intermediate table has 10,000,000,000 rows for the distinct (I don't think MySQL optimizes this a bit better). Doing the distinct first (as for the driver) greatly reduces the volume of data.

有一类SQL问题,您需要查看两个或更多项目的所有组合,然后确定每个人的值(甚至是数据中不存在的值),或者查找不属于的值. .这些问题带来了同样的问题:您如何获取有关数据中未包含的值的信息?

There are a class of SQL questions where you need to look at all combinations of two or more items and then determine values for everyone (even those that don't exist in the data) or find those that are not in the data. These problems pose the same problem: how do you get information about values not in the data?

我主张的解决方案是创建一个具有所有可能组合的表,然后对其余表使用left [outer] join.我将其称为驱动程序"表,因为此查询中的行通过定义后续联接的填充量来驱动"查询.

The solution that I advocate is to create a table that has all possible combinations, and then use left [outer] join for the remaining tables. I call this the "driver" table, because the rows in this query "drive" the query by defining the population for subsequent joins.

此术语与注释中的参考完全一致.从优化程序的角度来看,此评论使用的是术语.一些连接算法(特别是嵌套循环和索引查找)对连接的两侧有不同的对待.对于这些,一侧是驾驶/驱动程序"表.例如,当从大表联接到小参考表时,大表是驱动表,而另一个表则通过索引访问.其他联接算法(例如,合并联接和哈希联接(通常))对双方相同,因此该概念在此处不太适用.

This terminology is fairly consistent with the reference in the comment. The comment is using the term from the optimizer perspective. Some join algorithms -- particularly nested loop and index lookup -- treat the two sides of the join differently; for these, one side is the "driving/driver" table. For instance, when joining from a large table to a small reference table, the large table is the driving table and the other table is accessed through an index. Other join algorithms -- such as merge join and hash joins (in general) -- treat both sides the same, so the concept is less applicable there.

从逻辑角度讲,我用它来表示定义总体的查询.一个重要的相似之处是,对于左/右外部联接,实际上这两个定义是相同的.优化程序通常会选择left join中的第一个表作为驱动程序",因为它定义了输出行.

From the logical perspective, I'm using it to mean the query that defines the population. An important similarity is that for a left/right outer join, both definitions are, in practice, the same. The optimizer would typically choose the first table in a left join as the "driver", because it defines the output rows.

这篇关于计数查询花费的时间太长-已超过24小时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆