选择一个记录,如果它之前的一个较低的值花费的时间太长并且失败 [英] Select a record just if the one before it has a lower value takes too long and fail
问题描述
我已经就这个主题提出了 2 个问题,并且都得到了很好的答案.
I've already asked 2 questions about this subject and got great answers on both of them.
(问题 2 目前与主题无关,因为我不需要按月过滤)
现在自从我实现了建议给我的代码,只要我的表不断增长 sql 查询花费的时间越来越长.在开始时,它需要 8 秒才能在 1000 行上执行.现在在一个超过 25,000 行的表上,它只是失败了.
now ever since I implemented the code suggested to me, as long as my table kept growing the sql query took longer and longer. In the start it was taking for it 8 seconds to be executed on 1000 rows. now on a table with more than 25,000 rows it just fails.
您可以在此处观看我的查询 - http://sqlfiddle.com/#!2/5c480/1/0
SELECT a.ID, DATE_FORMAT(a.Time,'%d/%m/%y') AS T, a.SerialNumber, p.Model,
b.Remain_Toner_Black BeforeCountBlack,
a.Remain_Toner_Black AfterCountBlack,
b.Remain_Toner_Cyan BeforeCountCyan,
a.Remain_Toner_Cyan AfterCountCyan,
b.Remain_Toner_Magenta BeforeCountMagenta,
a.Remain_Toner_Magenta AfterCountMagenta,
b.Remain_Toner_Yellow BeforeCountYellow,
a.Remain_Toner_Yellow AfterCountYellow
FROM
(
SELECT a.ID,
a.Time,
a.SerialNumber,
a.Remain_Toner_Black,
a.Remain_Toner_Cyan,
a.Remain_Toner_Magenta,
a.Remain_Toner_Yellow,
(
SELECT COUNT(*)
FROM Reports c
WHERE c.SerialNumber = a.SerialNumber AND
c.ID <= a.ID) AS RowNumber
FROM Reports a
) a
LEFT JOIN
(
SELECT a.ID,
a.Time,
a.SerialNumber,
a.Remain_Toner_Black,
a.Remain_Toner_Cyan,
a.Remain_Toner_Magenta,
a.Remain_Toner_Yellow,
(
SELECT COUNT(*)
FROM Reports c
WHERE c.SerialNumber = a.SerialNumber AND
c.ID <= a.ID) AS RowNumber
FROM Reports a
) b ON a.SerialNumber = b.SerialNumber AND
a.RowNumber = b.RowNumber + 1
INNER JOIN Printers p ON a.SerialNumber = p.SerialNumber
INNER JOIN Customers c ON p.IP = c.IP AND c.Company = 5
WHERE (b.Remain_Toner_Black < a.Remain_Toner_Black AND b.Remain_Toner_Black >= 0) OR (b.Remain_Toner_Cyan < a.Remain_Toner_Cyan AND b.Remain_Toner_Cyan >= 0) OR (b.Remain_Toner_Magenta < a.Remain_Toner_Magenta AND b.Remain_Toner_Magenta >= 0) OR (b.Remain_Toner_Yellow < a.Remain_Toner_Yellow AND b.Remain_Toner_Yellow >= 0)
我需要使用以下 3 个表才能仅选择属于具有 ID 的特定公司的打印机.
I need to work with the following 3 tables in order to select only the printers belong to a specific company which has an ID.
报告:
ID SerialNumber Remain_Toner_Black
29881 Z30PBAHBB00034E 58
30001 Z30PBAHBB00034E 98
30200 Z30PBAHBB00034E 70
30205 BVCfdgdfgdf329F 50
30207 BVCfdgdfgdf329F 40
30210 Z30PBAHBB00034E 50
30301 Z30PBAHBB00034E 100
打印机:
IP SerialNumber Customer
80.179.228.81 Z30PBAHBB00034E 52
客户:
ID IP Company
52 80.179.228.81 5
我的查询完美运行并返回:
my query works perfectly and return:
ID SerialNumber BEFORECOUNTBLACK AFTERCOUNTBLACK
30001 Z30PBAHBB00034E 58 98
30301 Z30PBAHBB00034E 50 100
但是,当我在 Reports
表中有 25,000 行的表上运行它时,它现在失败了.
but again, it fails now when I'm running it on a table with 25,000 rows in the Reports
table.
推荐答案
这是针对您的问题 1 的解决方案,它会运行得更快,因为您有许多全表扫描和相关子查询.在这里,您最多只能进行一次表扫描(可能还有一个临时表,这取决于您的数据有多大以及您拥有多少内存).我认为您可以在此处轻松将其调整为您的问题.问题 2(我还没有真正阅读过)可能也已回答,因为现在很容易添加 where date_column = 任何内容
Here's a solution for your question 1 which will run much faster, since you have many full table scans and dependent subqueries. Here you will at most have just one table scan (and maybe a temporary table, depending how large your data is and how much memory you've got). I think you can easily adjust it to your question here. Question 2 (I haven't read it really) is probably also answered since it's easy now to just add where date_column = whatever
select * from (
select
t.*,
if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,
@prev_sn := SerialNumber,
@prev_toner := Remain_Toner_Black
from
Table1 t
, (select @prev_toner:=0, @prev_sn:=SerialNumber from Table1 order by SerialNumber limit 1) var_init
order by SerialNumber, id
) sq
where select_it = 1
- 在 sqlfiddle 中实时查看它的运行情况
说明:
有了这条线
, (select @prev_toner:=0, @prev_sn:=SerialNumber from Table1 order by SerialNumber
我们只是动态地初始化变量@prev_toner
和@prev_sn
.这和查询中根本没有这一行而是写在查询前面是一样的
we just initialize the variables @prev_toner
and @prev_sn
on the fly. It's the same as not having this line in the query at all but writing in front of the query
SET @prev_toner = 0;
SET @prev_sn = (select serialnumber from your_table order by serialnumber limit 1);
SELECT ...
那么,为什么查询要为@prev_sn 赋值,为什么要按序列号排序?顺序非常重要.如果没有 order by,则无法保证返回行的顺序.此外,我们将使用变量访问前一行的值,因此将相同的序列号组合在一起"很重要.
So, why do the query to assign a value to @prev_sn and why order by serialnumber? The order by is very important. Without an order by there's no guaranteed order in which rows are returned. Also we will access the previous rows value with variables, so it's important that same serial numbers are "grouped together".
select 子句中的列会一个接一个地计算,所以首先选择这一行很重要
The columns in the select clause are evaluated one after another, so it's important that you first select this line
if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,
在选择这两行之前
@prev_sn := SerialNumber,
@prev_toner := Remain_Toner_Black
这是为什么?最后两行仅将当前行的值分配给变量.因此在这一行
Why is that? The last two lines assign just the values of the current rows to the variables. Therefor in this line
if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,
变量仍然保存前一行的值.而我们在这里所做的无非是说如果 Remain_Toner_Black 列中的前一行值小于当前行中的值并且前一行序号与实际行序号相同, 返回 1,否则返回 0."
the variables still hold the values of the previous rows. And what we do here is nothing more than saying "if the previous rows value in column Remain_Toner_Black is smaller than the one in the current row and the previous rows serial number is the same as the actual rows serial number, return 1, else return 0."
然后我们可以简单地在外部查询中说选择每一行,上面返回 1".
Then we can simply say in the outer query "select every row, where the above returned 1".
根据您的查询,您不需要所有这些子查询.它们非常昂贵且不必要.其实这很疯狂.在这部分查询
Given your query, you don't need all these subqueries. They are very expensive and unnecessary. Actually it's quite insane. In this part of the query
SELECT a.ID,
a.Time,
a.SerialNumber,
a.Remain_Toner_Black,
a.Remain_Toner_Cyan,
a.Remain_Toner_Magenta,
a.Remain_Toner_Yellow,
(
SELECT COUNT(*)
FROM Reports c
WHERE c.SerialNumber = a.SerialNumber AND
c.ID <= a.ID) AS RowNumber
FROM Reports a
您选择整个表格,然后为每一行计算该组中的行数.这是一个依赖子查询.一切只是为了有某种行号.然后你第二次这样做,这样你就可以加入这两个临时表来获得前一行.真的,难怪表现太差了.
you select the whole table and for every row you count the rows within that group. That's a dependent subquery. All just to have some sort of row number. Then you do this a second time, just so you can join those two temporary tables to get the previous row. Really, no wonder the performance is horrible.
那么,如何根据您的查询调整我的解决方案?而不是我用来获取 Remain_Toner_Black 的前一行的一个变量,而是使用四种颜色来表示黑色、青色、品红色和黄色.只需像您一样加入打印机和客户表.不要忘记下单,你就完成了.
So, how to adjust my solution to your query? Instead of the one variable I used to get the previous row for Remain_Toner_Black use four for the colours black, cyan, magenta and yellow. And just join the Printers and Customers table like you did already. Don't forget the order by and you're done.
这篇关于选择一个记录,如果它之前的一个较低的值花费的时间太长并且失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!