SQL:自首次出现某些值以来的行数:继续计数 [英] SQL: Count of rows since certain value first occurred: keep counting
问题描述
这与以下情况类似 SQL:自首次出现某些值以来的行数
在SQL Server中,我试图计算自过去5天以来首次观察到与今天相同的天气(今天假设为2018年8月6日)的天数.每个城镇.
In SQL Server, I'm trying to calculate the count of days since the same weather as today (let's assume today is 6th August 2018) was observed first in the past 5 days. Per town.
以下是数据:
+---------+---------+--------+--------+--------+
| Date | Toronto | Cairo | Zagreb | Ankara |
+---------+---------+--------+--------+--------+
| 1.08.18 | Rain | Sun | Clouds | Sun |
| 2.08.18 | Sun | Sun | Clouds | Sun |
| 3.08.18 | Rain | Sun | Clouds | Rain |
| 4.08.18 | Clouds | Sun | Clouds | Clouds |
| 5.08.18 | Rain | Clouds | Rain | Rain |
| 6.08.18 | Rain | Sun | Sun | Sun |
+---------+---------+--------+--------+--------+
这需要表现良好,但到目前为止,我只想针对每个镇进行单个查询(并且将会有数十个镇,而不仅仅是四个镇).这行得通,但不会扩展规模.
This needs to perform well but all I came up with so far is single queries for each town (and there are going to be dozens of towns, not just the four). This works but is not going to scale.
这里是多伦多的那个...
Here's the one for Toronto...
SELECT
DATEDIFF(DAY, MIN([Date]), GETDATE()) + 1
FROM
(SELECT TOP 5 *
FROM Weather
WHERE [Date] <= GETDATE()
ORDER BY [Date] DESC) a
WHERE
Toronto = (SELECT TOP 1 Toronto
FROM Weather
WHERE DataDate = GETDATE())
...由于今天有雨,而过去5天内的第一次降雨是8月3日,因此正确返回4.
...which correctly returns 4 since today there is rain and the first occurrence of rain within the past 5 days was 3rd August.
但是我要返回的是一个像这样的表:
But what I want returned is a table like this:
+---------+-------+--------+--------+
| Toronto | Cairo | Zagreb | Ankara |
+---------+-------+--------+--------+
| 4 | 5 | 1 | 5 |
+---------+-------+--------+--------+
此代码由@Used_By_Already从接受的答案中进行了稍微修改:
Slightly modified from the accepted answer by @Used_By_Already is this code:
CREATE TABLE mytable(
Date date NOT NULL
,Toronto VARCHAR(9) NOT NULL
,Cairo VARCHAR(9) NOT NULL
,Zagreb VARCHAR(9) NOT NULL
,Ankara VARCHAR(9) NOT NULL
);
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180801','Rain','Sun','Clouds','Sun');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180802','Sun','Sun','Clouds','Sun');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180803','Rain','Sun','Clouds','Rain');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180804','Clouds','Sun','Clouds','Clouds');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180805','Rain','Clouds','Rain','Rain');
INSERT INTO mytable(Date,Toronto,Cairo,Zagreb,Ankara) VALUES ('20180806','Rain','Sun','Sun','Sun');
with cte as (
select
date, city, weather
FROM (
SELECT * from mytable
) AS cp
UNPIVOT (
Weather FOR City IN (Toronto, Cairo, Zagreb, Ankara)
) AS up
)
select
date, city, weather, datediff(day,ca.prior,cte.date)+1 as daysPresent
from cte
cross apply (
select min(prev.date) as prior
from cte as prev
where prev.city = cte.city
and prev.date between dateadd(day,-4,cte.date) and dateadd(day,0,cte.date)
and prev.weather = cte.weather
) ca
order by city,date
输出:
但是,我现在正在尝试的是即使在有问题的过去五天之后,也要继续计算"daysPresent".这意味着输出样本中最后一个标记的行应显示为6.逻辑是,如果它们之间的间隔少于5天,则以它们之间的天数增加前一个数字.如果过去5天的天气不同,请返回1.
However, what I'm trying now is to keep counting "daysPresent" up even after those five past days in question. Meaning that the last marked row in the output sample should show 6. The logic being to increase the previous number by the count of days between them if there is less than 5 days of a gap between them. If there has not been the same weather in the past 5 days, go back to 1.
我尝试了LEAD和LAG,但无法使其正常工作.是在其上添加另一层的正确方法,还是查询看起来需要完全不同?
I experimented with LEAD and LAG but cannot get it to work. Is it even the right way to add another layer to it or would the query need to look different entirely?
我是一个但很困惑的人.
I'm a but puzzled.
推荐答案
您的数据结构存在重大问题.值应该在行中,而不是在列中.因此,从以下开始:
You have a major problem with your data structure. The values should be in rows, not columns. So, start with:
select d.dte, v.*from data d cross apply
(values ('Toronto', Toronto), ('Cairo', Cairo), . . .
) v(city, val)
where d.date >= dateadd(day, -5, getdate());
从那里,我们可以使用窗口函数first_value()
(或last_value()
)来获取最新的读数.其余的只是按城市汇总:
From there, we can use the window function first_value()
(or last_value()
) to get the most recent reading. The rest is just aggregation by city:
with d as (
select d.dte, v.*,
first_value(v.val) over (partition by v.city order by d.dte desc) as last_val
from data d cross apply
(values ('Toronto', Toronto), ('Cairo', Cairo), . . .
) v(city, val)
where d.date >= dateadd(day, -5, getdate())
)
select city, datediff(day, min(dte), getdate()) + 1
from d
where val = last_val
group by city;
这将以行而不是列的形式为您提供所需的信息.您可以根据需要重新旋转.但是我建议您将数据和城市数据保留在不同的行中.
This gives you the information you want, in rows rather than columns. You can re-pivot if you really want. But I advise you to keep the data with city data in different rows.
这篇关于SQL:自首次出现某些值以来的行数:继续计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!