计算非连续值 [英] Counting non-contiguous values

查看:47
本文介绍了计算非连续值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有以下结构:

+------+---------------+---------------+----+
| guid | current_level | current_value | pk |
+------+---------------+---------------+----+
| a    |           100 |            12 |  1 |
| a    |           200 |            12 |  2 |
| a    |           200 |            12 |  3 |
| a    |           200 |            12 |  4 |
| a    |           200 |            12 |  6 |
| a    |           300 |            14 |  7 |
| a    |           300 |            12 |  9 |
| a    |           300 |            12 | 10 |
| a    |           300 |            14 | 12 |
| b    |           100 |            10 |  5 |
| b    |           100 |            10 |  8 |
| b    |           200 |            12 | 11 |
| b    |           200 |            12 | 13 |
+------+---------------+---------------+----+

我想计算 a 和 b 不连续地达到 200 级的次数(300 但现在只有 200)即我期待的结果:

I wish to count the number of times a and b went to level 200 non-contiguously (and 300 but just 200 for now) i.e the result I'm expecting:

+------+-------+-------+
| guid | level | times |
+------+-------+-------+
| a    |   200 |     1 |
| b    |   200 |     1 |
+------+-------+-------+

(我不能只做唯一的,因为 200s 的独立条纹应该单独计算)

(I can't just do unique because separate streaks of 200s should be counted separately)

当我执行以下操作时:

set @id = "none";
set @lev = 10; -- arbitary non zero starting level

SELECT  guid, current_level , if(@id <> guid, @lev := 10, 0) AS useless, case when @id <> guid then @id := guid else 0 end AS useless2
         , (case when (current_level = 200 AND current_level <> @lev) then 1 else 0 end) as TIMES
        , if(current_level = 200 AND current_level <> @lev, @lev := current_level, 0) AS useless3

 FROM (SELECT * FROM sensor_logs order by guid) as T

我明白了:

+------+---------------+---------+----------+----------+----------+
| guid | current_level | useless | useless2 | TIMES    | useless3 |
+------+---------------+---------+----------+----------+----------+
| a    |           100 |      10 | a        |        0 |        0 |
| a    |           200 |       0 | 0        |        1 |      200 |
| a    |           200 |       0 | 0        |        0 |        0 |
| a    |           200 |       0 | 0        |        0 |        0 |
| a    |           200 |       0 | 0        |        0 |        0 |
| a    |           300 |       0 | 0        |        0 |        0 |
| a    |           300 |       0 | 0        |        0 |        0 |
| a    |           300 |       0 | 0        |        0 |        0 |
| a    |           300 |       0 | 0        |        0 |        0 |
| b    |           100 |      10 | b        |        0 |        0 |
| b    |           100 |       0 | 0        |        0 |        0 |
| b    |           200 |       0 | 0        |        1 |      200 |
| b    |           200 |       0 | 0        |        0 |        0 |
+------+---------------+---------+----------+----------+----------+

所以现在总结 TIMES 列并按 guid 分组应该可以解决问题,即:

So now summing the TIMES column and grouping by guid should do the trick, ie:

set @id = "none";
set @lev = 10; -- arbitary non zero starting level

SELECT  guid, current_level , if(@id <> guid, @lev := 10, 0) AS useless, case when @id <> guid then @id := guid else 0 end AS useless2
         , sum(case when (current_level = 200 AND current_level <> @lev) then 1 else 0 end) as TIMES
        , if(current_level = 200 AND current_level <> @lev, @lev := current_level, 0) AS useless3

 FROM (SELECT * FROM sensor_logs order by guid) as T
 GROUP BY guid

但我得到以下信息:

+------+---------------+---------+----------+----------+----------+
| guid | current_level | useless | useless2 | TIMES    | useless3 |
+------+---------------+---------+----------+----------+----------+
| a    |           100 |      10 | a        |        4 |        0 |
| b    |           100 |      10 | b        |        2 |        0 |
+------+---------------+---------+----------+----------+----------+

我不明白为什么对具有两个 1(每个 guid 一个)的列求和会导致 4 和 2.

I do not understand why summing a column which had two 1s (one for each guid) results in 4 and 2.

是不是我做错了什么?还是更多地与如何执行查询(和 sum 函数)的底层机制有关?

Is it something I am doing wrong? or more to do with the underlying mechanics of how the query (and sum function) is executed?

推荐答案

你的第一个查询最好这样写:

Your first query you better write it like this:

SELECT  guid, current_level , if(@id <> guid, @lev := 10, 0) AS useless, case when @id <> guid then @id := guid else 0 end AS useless2
         , (case when (current_level = 200 AND current_level <> @lev) then 1 else 0 end) as TIMES
        , if(current_level = 200 AND current_level <> @lev, @lev := current_level, 0) AS useless3

 FROM sensor_logs
 , (SELECT @id := 'none', @lev := 10) var_init_subquery
 ORDER BY guid

不仅在需要时显式进行排序(而不是在子查询中)更干净,而且在子查询中进行排序也可能导致错误的执行计划(这意味着在临时表的情况下性能不佳).

Not only is it more clean to do the ordering explicitly when needed, not in a subquery, doing it in a subquery might also lead to a bad execution plan (which means bad performance in case of temporary table).

对于您的最终结果,您不应直接应用 GROUP BY 等.SELECT(以及您的变量和计算)在 GROUP BY 之后进行评估.要在计算后进行分组,请将查询放在子查询中:

For your final result, you shouldn't directly apply the GROUP BY and so on. The SELECT (and therefore your variables and calculations) are evaluated after the GROUP BY. To do the grouping after your calculations, put your query in a subquery:

SELECT guid, SUM(times) FROM (
    SELECT  guid, current_level , if(@id <> guid, @lev := 10, 0) AS useless, case when @id <> guid then @id := guid else 0 end AS useless2
             , (case when (current_level = 200 AND current_level <> @lev) then 1 else 0 end) as TIMES
            , if(current_level = 200 AND current_level <> @lev, @lev := current_level, 0) AS useless3

     FROM sensor_logs
     , (SELECT @id := 'none', @lev := 10) var_init_subquery
     ORDER BY guid
) sq
GROUP BY guid

这篇关于计算非连续值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆