选择一个记录,如果它之前的一个较低的值花费的时间太长并且失败 [英] Select a record just if the one before it has a lower value takes too long and fail

查看:26
本文介绍了选择一个记录,如果它之前的一个较低的值花费的时间太长并且失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经就这个主题提出了 2 个问题,并且都得到了很好的答案.

I've already asked 2 questions about this subject and got great answers on both of them.

  1. 选择一条记录,如果前一条记录的值较低

选择一条记录,如果前一条记录具有按月过滤的较低值

(问题 2 目前与主题无关,因为我不需要按月过滤)

现在自从我实现了建议给我的代码,只要我的表不断增长 sql 查询花费的时间越来越长.在开始时,它需要 8 秒才能在 1000 行上执行.现在在一个超过 25,000 行的表上,它只是失败了.

now ever since I implemented the code suggested to me, as long as my table kept growing the sql query took longer and longer. In the start it was taking for it 8 seconds to be executed on 1000 rows. now on a table with more than 25,000 rows it just fails.

您可以在此处观看我的查询 - http://sqlfiddle.com/#!2/5c480/1/0

SELECT  a.ID, DATE_FORMAT(a.Time,'%d/%m/%y') AS T, a.SerialNumber, p.Model,
    b.Remain_Toner_Black BeforeCountBlack,
    a.Remain_Toner_Black AfterCountBlack,
    b.Remain_Toner_Cyan BeforeCountCyan,
    a.Remain_Toner_Cyan AfterCountCyan,
b.Remain_Toner_Magenta BeforeCountMagenta,
    a.Remain_Toner_Magenta AfterCountMagenta,
b.Remain_Toner_Yellow BeforeCountYellow,
    a.Remain_Toner_Yellow AfterCountYellow
FROM    
    (
        SELECT  a.ID, 
                a.Time, 
                a.SerialNumber, 
                a.Remain_Toner_Black,
                a.Remain_Toner_Cyan,
                a.Remain_Toner_Magenta,
                a.Remain_Toner_Yellow,
                (
                    SELECT  COUNT(*)
                    FROM    Reports c
                    WHERE   c.SerialNumber = a.SerialNumber AND
                            c.ID <= a.ID) AS RowNumber
        FROM    Reports a
    ) a
    LEFT JOIN
    (
        SELECT  a.ID, 
                a.Time, 
                a.SerialNumber, 
                a.Remain_Toner_Black,
                a.Remain_Toner_Cyan,
                a.Remain_Toner_Magenta,
                a.Remain_Toner_Yellow,
                (
                    SELECT  COUNT(*)
                    FROM    Reports c
                    WHERE   c.SerialNumber = a.SerialNumber AND
                            c.ID <= a.ID) AS RowNumber
        FROM    Reports a
    ) b ON a.SerialNumber = b.SerialNumber AND
            a.RowNumber = b.RowNumber + 1
INNER JOIN Printers p ON a.SerialNumber = p.SerialNumber
INNER JOIN Customers c ON p.IP = c.IP AND c.Company = 5
WHERE   (b.Remain_Toner_Black < a.Remain_Toner_Black AND b.Remain_Toner_Black >= 0) OR (b.Remain_Toner_Cyan < a.Remain_Toner_Cyan AND b.Remain_Toner_Cyan >= 0) OR (b.Remain_Toner_Magenta < a.Remain_Toner_Magenta AND b.Remain_Toner_Magenta >= 0) OR (b.Remain_Toner_Yellow < a.Remain_Toner_Yellow AND b.Remain_Toner_Yellow >= 0)

我需要使用以下 3 个表才能仅选择属于具有 ID 的特定公司的打印机.

I need to work with the following 3 tables in order to select only the printers belong to a specific company which has an ID.

报告:

ID         SerialNumber         Remain_Toner_Black   
29881      Z30PBAHBB00034E      58   
30001      Z30PBAHBB00034E      98
30200      Z30PBAHBB00034E      70
30205      BVCfdgdfgdf329F      50
30207      BVCfdgdfgdf329F      40
30210      Z30PBAHBB00034E      50
30301      Z30PBAHBB00034E      100

打印机:

IP                 SerialNumber         Customer 
80.179.228.81      Z30PBAHBB00034E      52

客户:

ID         IP                      Company        
52         80.179.228.81           5

我的查询完美运行并返回:

my query works perfectly and return:

ID         SerialNumber            BEFORECOUNTBLACK         AFTERCOUNTBLACK        
30001      Z30PBAHBB00034E         58                       98
30301      Z30PBAHBB00034E         50                       100

但是,当我在 Reports 表中有 25,000 行的表上运行它时,它现在失败了.

but again, it fails now when I'm running it on a table with 25,000 rows in the Reports table.

推荐答案

这是针对您的问题 1 的解决方案,它会运行得更快,因为您有许多全表扫描和相关子查询.在这里,您最多只能进行一次表扫描(可能还有一个临时表,这取决于您的数据有多大以及您拥有多少内存).我认为您可以在此处轻松将其调整为您的问题.问题 2(我还没有真正阅读过)可能也已回答,因为现在很容易添加 where date_column = 任何内容

Here's a solution for your question 1 which will run much faster, since you have many full table scans and dependent subqueries. Here you will at most have just one table scan (and maybe a temporary table, depending how large your data is and how much memory you've got). I think you can easily adjust it to your question here. Question 2 (I haven't read it really) is probably also answered since it's easy now to just add where date_column = whatever

select * from (
    select
    t.*,
    if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,
    @prev_sn := SerialNumber,
    @prev_toner := Remain_Toner_Black
    from
    Table1 t
    , (select @prev_toner:=0, @prev_sn:=SerialNumber from Table1 order by SerialNumber limit 1) var_init
    order by SerialNumber, id
) sq  
where select_it = 1

  • sqlfiddle 中实时查看它的运行情况
  • 说明:

    有了这条线

        , (select @prev_toner:=0, @prev_sn:=SerialNumber from Table1 order by SerialNumber 
    

    我们只是动态地初始化变量@prev_toner@prev_sn.这和查询中根本没有这一行而是写在查询前面是一样的

    we just initialize the variables @prev_toner and @prev_sn on the fly. It's the same as not having this line in the query at all but writing in front of the query

    SET @prev_toner = 0;
    SET @prev_sn = (select serialnumber from your_table order by serialnumber limit 1);
    SELECT ...
    

    那么,为什么查询要为@prev_sn 赋值,为什么要按序列号排序?顺序非常重要.如果没有 order by,则无法保证返回行的顺序.此外,我们将使用变量访问前一行的值,因此将相同的序列号组合在一起"很重要.

    So, why do the query to assign a value to @prev_sn and why order by serialnumber? The order by is very important. Without an order by there's no guaranteed order in which rows are returned. Also we will access the previous rows value with variables, so it's important that same serial numbers are "grouped together".

    select 子句中的列会一个接一个地计算,所以首先选择这一行很重要

    The columns in the select clause are evaluated one after another, so it's important that you first select this line

    if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,
    

    在选择这两行之前

    @prev_sn := SerialNumber,
    @prev_toner := Remain_Toner_Black
    

    这是为什么?最后两行仅将当前行的值分配给变量.因此在这一行

    Why is that? The last two lines assign just the values of the current rows to the variables. Therefor in this line

    if(@prev_toner < Remain_Toner_Black and @prev_sn = SerialNumber, 1, 0) as select_it,
    

    变量仍然保存前一行的值.而我们在这里所做的无非是说如果 Remain_Toner_Black 列中的前一行值小于当前行中的值并且前一行序号与实际行序号相同, 返回 1,否则返回 0."

    the variables still hold the values of the previous rows. And what we do here is nothing more than saying "if the previous rows value in column Remain_Toner_Black is smaller than the one in the current row and the previous rows serial number is the same as the actual rows serial number, return 1, else return 0."

    然后我们可以简单地在外部查询中说选择每一行,上面返回 1".

    Then we can simply say in the outer query "select every row, where the above returned 1".

    根据您的查询,您不需要所有这些子查询.它们非常昂贵且不必要.其实这很疯狂.在这部分查询

    Given your query, you don't need all these subqueries. They are very expensive and unnecessary. Actually it's quite insane. In this part of the query

        SELECT  a.ID, 
                a.Time, 
                a.SerialNumber, 
                a.Remain_Toner_Black,
                a.Remain_Toner_Cyan,
                a.Remain_Toner_Magenta,
                a.Remain_Toner_Yellow,
                (
                    SELECT  COUNT(*)
                    FROM    Reports c
                    WHERE   c.SerialNumber = a.SerialNumber AND
                            c.ID <= a.ID) AS RowNumber
        FROM    Reports a
    

    您选择整个表格,然后为每一行计算该组中的行数.这是一个依赖子查询.一切只是为了有某种行号.然后你第二次这样做,这样你就可以加入这两个临时表来获得前一行.真的,难怪表现太差了.

    you select the whole table and for every row you count the rows within that group. That's a dependent subquery. All just to have some sort of row number. Then you do this a second time, just so you can join those two temporary tables to get the previous row. Really, no wonder the performance is horrible.

    那么,如何根据您的查询调整我的解决方案?而不是我用来获取 Remain_Toner_Black 的前一行的一个变量,而是使用四种颜色来表示黑色、青色、品红色和黄色.只需像您一样加入打印机和客户表.不要忘记下单,你就完成了.

    So, how to adjust my solution to your query? Instead of the one variable I used to get the previous row for Remain_Toner_Black use four for the colours black, cyan, magenta and yellow. And just join the Printers and Customers table like you did already. Don't forget the order by and you're done.

    这篇关于选择一个记录,如果它之前的一个较低的值花费的时间太长并且失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆