使用GROUP BY与FIRST_VALUE和LAST_VALUE [英] Using GROUP BY with FIRST_VALUE and LAST_VALUE

查看:1529
本文介绍了使用GROUP BY与FIRST_VALUE和LAST_VALUE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  CREATE TABLE#这个数据是以1分钟的间隔存储的, MinuteData 

[Id] INT,
[MinuteBar] DATETIME,
[Open] NUMERIC(12,6),
[High] NUMERIC(12,6 ),
[低] NUMERIC(12,6),
[关闭] NUMERIC(12,6)
);

INSERT INTO #MinuteData
([Id],[MinuteBar],[Open],[High],[Low],[Close])
VALUES(1, 2015-01-01 17:00:00',1.557870,1.557880,1.557870,1.557880),
(2,'2015-01-01 17:01:00',1.557900,1.557900,1.557880,1.557880),
(3,'2015-01-01 17:02:00',1.557960,1.558070,1.557960,1.558040),
(4,'2015-01-01 17:03:00',1.558080 ,1.558100,1.558040,1.558050),
(5,'2015-01-01 17:04:00',1.558050,1.558100,1.558020,1.558030),
(6,'2015-01-01 17:05:00',1.558580,1.558710,1.557870,1.557950),
(7,'2015-01-01 17:06:00',1.557910,1.558120,1.557910,1.557990),
( 8,'2015-01-01 17:07:00',1.557940,1.558250,1.557940,1.558170),
(9,'2015-01-01 17:08:00',1.558140,1.558200,1.558080, 1.558120),
(10,'2015-01-01 17:09:00',1.558110,1.558140,1.557970,1.557970);

SELECT *
FROM #MinuteData;

DROP TABLE #MinuteData;

这些值跟踪货币汇率,因此对于每个分钟时间间隔(bar),都有打开价格作为分钟开始和分钟关闭价格。 高位低位值代表每个分钟的最高和最低价格。



期望输出



我希望将这些数据重新格式化为5分钟的时间间隔,以产生以下输出:

  MinuteBar打开关闭低点
2015-01-01 17:00:00 00.000 1.557870 1.558030 1.557870 1.558100
2015-01-01 17:05:00.000 1.558580 1.557970 1.557870 1.558710

这需要从5的第一分钟开始值,从5的最后一分钟开始关闭值。 >高值表示最高和最低

当前解决方案



我有一个解决方案可以做到这一点(下面),但是它依赖于 id 值和自连接。另外,我打算在更大的数据集上运行它,所以我希望尽可能以更高效的方式执行它。

  - 创建一个列,允许在5分钟内分组间隔
SELECT Id,MinuteBar,[Open],High,Low,[Close],
DATEDIFF(MINUTE,'2015-01-01T00:00: 00',MinuteBar)/ 5 AS Interval
INTO#5MinuteData
FROM #MinuteData
ORDER BY分钟

- 按自身加入之前的集合和集合计算
SELECT Interval,
MIN(MinuteBar)AS MinuteBar,
MIN(Id)AS OpenId,
MAX(Id)AS CloseId,
MIN(Low)AS Low,
MAX(高)AS高
INTO #DataMinMax
FROM#5MinuteData
GROUP BY Interval;

- 自加入以获取打开和关闭值
SELECT t1.Interval,
t1.MinuteBar,
tOpen。​​[打开],
tClose。[Close],
t1.Low,
t1.High
FROM #DataMinMax t1
INNER JOIN#5MinuteData tOpen ON tOpen.Id = OpenId
INNER JOIN #5MinuteData tClose ON tClose.Id = CloseId;

DROP TABLE #DataMinMax
DROP TABLE#5MinuteData

返工尝试



除了上述查询外,我一直在使用 FIRST_VALUE LAST_VALUE ,因为它似乎是我所追求的,但我无法完全理解我正在做的分组。可能有比我想要做的更好的解决方案,所以我愿意接受建议。目前我正在尝试这样做:

  SELECT MIN(MinuteBar)MinuteBar5,
FIRST_VALUE([Open]) OVER(由MinuteBar定单)AS开仓
MAX(高)AS高,
MIN(低)AS低,
LAST_VALUE([Close])OVER(由MinuteBar定单)AS结束,
DATEDIFF(MINUTE,'2015-01-01 00:00:00',MinuteBar)/ 5 AS间隔
从#MinuteData
GROUP BY DATEDIFF(MINUTE,'2015-01-01 00:00:00',MinuteBar)/ 5

这给了我下面的错误,到 FIRST_VALUE LAST_VALUE 作为查询运行时,如果我删除这些行:


列'#MinuteData.MinuteBar'在选择列表中无效,因为它不包含在聚合函数或GROUP BY子句中。



解决方案

  SELECT 
MIN(MinuteBar)AS MinuteBar5,
Openi ng,
MAX(高)AS高,
MIN(低)AS低,
收盘,
区间
FROM

SELECT FIRST_VALUE([Open])OVER(PARTITION BY DATEDIFF(MINUTE,'2015-01-01 00:00:00',MinuteBar)/ 5 ORDER BY MinuteBar)AS Opening,
FIRST_VALUE([Close])OVER PARTITION BY DATEDIFF(MINUTE,'2015-01-01 00:00:00',MinuteBar)/ 5 ORDER BY MinuteBAR DESC)截止日期,
DATEDIFF(MINUTE,'2015-01-01 00:00:00 ',MinuteBar)/ 5 AS Interval,
*
FROM #MinuteData
)AS T
GROUP BY区间,开门,关门

解决方案接近您当前的解决方案。有两个地方你做错了。


  1. FIRST_VALUE和LAST_VALUE是分析函数,它可以在窗口或分区,而不是一个组。您可以单独运行嵌套查询并查看其结果。

  2. LAST_VALUE是当前窗口的最后一个值,该值在查询中未指定,默认窗口为当前分区的第一行到当前行。您可以使用FIRST_VALUE进行取消订单或指定一个窗口。

      LAST_VALUE([Close])OVER(PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00',MinuteBar)/ 5 
    按分钟栏顺序排列的
    无限上行与无限制之间的行)截止,



I'm working with some data that is currently stored in 1 minute intervals that looks like this:

CREATE TABLE #MinuteData
    (
      [Id] INT ,
      [MinuteBar] DATETIME ,
      [Open] NUMERIC(12, 6) ,
      [High] NUMERIC(12, 6) ,
      [Low] NUMERIC(12, 6) ,
      [Close] NUMERIC(12, 6)
    );

INSERT  INTO #MinuteData
        ( [Id], [MinuteBar], [Open], [High], [Low], [Close] )
VALUES  ( 1, '2015-01-01 17:00:00', 1.557870, 1.557880, 1.557870, 1.557880 ),
        ( 2, '2015-01-01 17:01:00', 1.557900, 1.557900, 1.557880, 1.557880 ),
        ( 3, '2015-01-01 17:02:00', 1.557960, 1.558070, 1.557960, 1.558040 ),
        ( 4, '2015-01-01 17:03:00', 1.558080, 1.558100, 1.558040, 1.558050 ),
        ( 5, '2015-01-01 17:04:00', 1.558050, 1.558100, 1.558020, 1.558030 ),
        ( 6, '2015-01-01 17:05:00', 1.558580, 1.558710, 1.557870, 1.557950 ),
        ( 7, '2015-01-01 17:06:00', 1.557910, 1.558120, 1.557910, 1.557990 ),
        ( 8, '2015-01-01 17:07:00', 1.557940, 1.558250, 1.557940, 1.558170 ),
        ( 9, '2015-01-01 17:08:00', 1.558140, 1.558200, 1.558080, 1.558120 ),
        ( 10, '2015-01-01 17:09:00', 1.558110, 1.558140, 1.557970, 1.557970 );

SELECT  *
FROM    #MinuteData;

DROP TABLE #MinuteData;

The values track currency exchange rates, so for each minute interval (bar), there is the Open price as the minute started and a Close price for the minute end. The High and Low values represent the highest and lowest rate during each individual minute.

Desired Output

I'm looking to reformat this data in to 5 minute intervals to produce the following output:

MinuteBar                Open       Close       Low         High
2015-01-01 17:00:00.000  1.557870   1.558030    1.557870    1.558100
2015-01-01 17:05:00.000  1.558580   1.557970    1.557870    1.558710

This takes the Open value from the first minute of the 5, the Close value from the last minute of the 5. The High and Low values represent the highest high and lowest low rates across the 5 minute period.

Current Solution

I have a solution that does this (below), but it feels inelegant as it relies on id values and self joins. Also, I intend to run it on much larger datasets so I was looking to do it in a more efficient manner if possible:

-- Create a column to allow grouping in 5 minute Intervals
SELECT  Id, MinuteBar, [Open], High, Low, [Close], 
DATEDIFF(MINUTE, '2015-01-01T00:00:00', MinuteBar)/5 AS Interval
INTO    #5MinuteData
FROM    #MinuteData
ORDER BY minutebar

-- Group by inteval and aggregate prior to self join
SELECT  Interval ,
        MIN(MinuteBar) AS MinuteBar ,
        MIN(Id) AS OpenId ,
        MAX(Id) AS CloseId ,
        MIN(Low) AS Low ,
        MAX(High) AS High
INTO    #DataMinMax
FROM    #5MinuteData
GROUP BY Interval;

-- Self join to get the Open and Close values
SELECT  t1.Interval ,
        t1.MinuteBar ,
        tOpen.[Open] ,
        tClose.[Close] ,
        t1.Low ,
        t1.High
FROM    #DataMinMax t1
        INNER JOIN #5MinuteData tOpen ON tOpen.Id = OpenId
        INNER JOIN #5MinuteData tClose ON tClose.Id = CloseId;

DROP TABLE #DataMinMax
DROP TABLE #5MinuteData

Rework Attempt

Instead of the above queries, I've been looking at using FIRST_VALUE and LAST_VALUE, as it seems to be what I'm after, but I can't quite get it working with the grouping that I'm doing. There might be a better solution than what I'm trying to do, so I'm open to suggestions. Currently I'm trying to do this:

SELECT  MIN(MinuteBar) MinuteBar5 ,
        FIRST_VALUE([Open]) OVER (ORDER BY MinuteBar) AS Opening,
        MAX(High) AS High ,
        MIN(Low) AS Low ,
        LAST_VALUE([Close]) OVER (ORDER BY MinuteBar) AS Closing ,
        DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval
FROM    #MinuteData
GROUP BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5

This gives me the below error, which is related to the FIRST_VALUE and LAST_VALUE as the query runs if I remove those lines:

Column '#MinuteData.MinuteBar' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

解决方案

SELECT 
    MIN(MinuteBar) AS MinuteBar5,
    Opening,
    MAX(High) AS High,
    MIN(Low) AS Low,
    Closing,
    Interval
FROM 
(
    SELECT FIRST_VALUE([Open]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar) AS Opening,
           FIRST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar DESC) AS Closing,
           DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval,
           *
    FROM #MinuteData
) AS T
GROUP BY Interval, Opening, Closing

A solution close to your current one. There are two places you did wrong.

  1. FIRST_VALUE AND LAST_VALUE are Analytic Functions, which work on a window or partition, instead of a group. You can run the nested query alone and see its result.
  2. LAST_VALUE is the last value of current window, which is not specified in your query, and a default window is rows from the first row of current partition to current row. You can either use FIRST_VALUE with deseeding order or specify a window

    LAST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 
                ORDER BY MinuteBar 
                ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Closing,
    

这篇关于使用GROUP BY与FIRST_VALUE和LAST_VALUE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆