SQL:自首次出现某些值以来的行数:继续计数 [英] SQL: Count of rows since certain value first occurred: keep counting

查看:156
本文介绍了SQL:自首次出现某些值以来的行数:继续计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这与以下情况类似 SQL:自首次出现某些值以来的行数

在SQL Server中,我试图计算自过去5天以来首次观察到与今天相同的天气(今天假设为2018年8月6日)的天数.每个城镇.

In SQL Server, I'm trying to calculate the count of days since the same weather as today (let's assume today is 6th August 2018) was observed first in the past 5 days. Per town.

以下是数据:

+---------+---------+--------+--------+--------+
| Date    | Toronto | Cairo  | Zagreb | Ankara |
+---------+---------+--------+--------+--------+
| 1.08.18 | Rain    | Sun    | Clouds | Sun    |
| 2.08.18 | Sun     | Sun    | Clouds | Sun    |
| 3.08.18 | Rain    | Sun    | Clouds | Rain   |
| 4.08.18 | Clouds  | Sun    | Clouds | Clouds |
| 5.08.18 | Rain    | Clouds | Rain   | Rain   |
| 6.08.18 | Rain    | Sun    | Sun    | Sun    |
+---------+---------+--------+--------+--------+

这需要表现良好,但到目前为止,我只想针对每个镇进行单个查询(并且将会有数十个镇,而不仅仅是四个镇).这行得通,但不会扩展规模.

This needs to perform well but all I came up with so far is single queries for each town (and there are going to be dozens of towns, not just the four). This works but is not going to scale.

这里是多伦多的那个...

Here's the one for Toronto...

SELECT 
    DATEDIFF(DAY, MIN([Date]), GETDATE()) + 1 
FROM
    (SELECT TOP 5 * 
     FROM Weather 
     WHERE [Date] <= GETDATE()
     ORDER BY [Date] DESC) a
WHERE 
    Toronto = (SELECT TOP 1 Toronto 
               FROM Weather
               WHERE DataDate = GETDATE())

...由于今天有雨,而过去5天内的第一次降雨是8月3日,因此正确返回4.

...which correctly returns 4 since today there is rain and the first occurrence of rain within the past 5 days was 3rd August.

但是我要返回的是一个像这样的表:

But what I want returned is a table like this:

+---------+-------+--------+--------+
| Toronto | Cairo | Zagreb | Ankara |
+---------+-------+--------+--------+
| 4       | 5     | 1      | 5      |
+---------+-------+--------+--------+

此代码由@Used_By_Already从接受的答案中进行了稍微修改:

Slightly modified from the accepted answer by @Used_By_Already is this code:

CREATE TABLE mytable(
   Date    date  NOT NULL
  ,Toronto VARCHAR(9) NOT NULL
  ,Cairo   VARCHAR(9) NOT NULL
  ,Zagreb  VARCHAR(9) NOT NULL
  ,Ankara  VARCHAR(9) NOT NULL
);

INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180801','Rain','Sun','Clouds','Sun');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180802','Sun','Sun','Clouds','Sun');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180803','Rain','Sun','Clouds','Rain');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180804','Clouds','Sun','Clouds','Clouds');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180805','Rain','Clouds','Rain','Rain');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180806','Rain','Sun','Sun','Sun');

with cte as (
        select
              date, city, weather
        FROM (
              SELECT * from mytable
             ) AS cp
        UNPIVOT (
                  Weather FOR City IN (Toronto, Cairo, Zagreb, Ankara)
            ) AS up
    )

select 
        date, city, weather, datediff(day,ca.prior,cte.date)+1 as daysPresent
from cte
cross apply (
    select min(prev.date) as prior
    from cte as prev 
    where prev.city = cte.city
    and prev.date between dateadd(day,-4,cte.date) and dateadd(day,0,cte.date)
    and prev.weather = cte.weather
    ) ca

order by city,date

输出:

但是,我现在正在尝试的是即使在有问题的过去五天之后,也要继续计算"daysPresent".这意味着输出样本中最后一个标记的行应显示为6.逻辑是,如果它们之间的间隔少于5天,则以它们之间的天数增加前一个数字.如果过去5天的天气不同,请返回1.

However, what I'm trying now is to keep counting "daysPresent" up even after those five past days in question. Meaning that the last marked row in the output sample should show 6. The logic being to increase the previous number by the count of days between them if there is less than 5 days of a gap between them. If there has not been the same weather in the past 5 days, go back to 1.

我尝试了LEAD和LAG,但无法使其正常工作.是在其上添加另一层的正确方法,还是查询看起来需要完全不同?

I experimented with LEAD and LAG but cannot get it to work. Is it even the right way to add another layer to it or would the query need to look different entirely?

我是一个但很困惑的人.

I'm a but puzzled.

推荐答案

您的数据结构存在重大问题.值应该在中,而不是在中.因此,从以下开始:

You have a major problem with your data structure. The values should be in rows, not columns. So, start with:

select d.dte, v.*from data d cross apply
     (values ('Toronto', Toronto), ('Cairo', Cairo), . . .
     ) v(city, val)
where d.date >= dateadd(day, -5, getdate());

从那里,我们可以使用窗口函数first_value()(或last_value())来获取最新的读数.其余的只是按城市汇总:

From there, we can use the window function first_value() (or last_value()) to get the most recent reading. The rest is just aggregation by city:

with d as (
      select d.dte, v.*,
             first_value(v.val) over (partition by v.city order by d.dte desc) as last_val
      from data d cross apply
           (values ('Toronto', Toronto), ('Cairo', Cairo), . . .
           ) v(city, val)
      where d.date >= dateadd(day, -5, getdate())
     )
select city, datediff(day, min(dte), getdate()) + 1
from d
where val = last_val
group by city;

这将以行而不是列的形式为您提供所需的信息.您可以根据需要重新旋转.但是我建议您将数据和城市数据保留在不同的行中.

This gives you the information you want, in rows rather than columns. You can re-pivot if you really want. But I advise you to keep the data with city data in different rows.

这篇关于SQL:自首次出现某些值以来的行数:继续计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆