使用与周围行数据的间隙距离成正比的值填充数据中的间隙? [英] Fill in gaps in data, using a value proportional to the gap distance to data from the surrounding rows?

查看:19
本文介绍了使用与周围行数据的间隙距离成正比的值填充数据中的间隙?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在不久的某个时候,我将不得不准备一份商品价格清单.粒度为 1 天,在有商品销售的日子里,我将平均价格以获得当天的平均价格.会有几天没有销售,我很适合通过拉动上一次和下一次销售来使用足够的近似值,并且它们之间的每一天都有一个从一个线性过渡到另一个的价格.

At some point soon I'll have to prepare a list of prices of items on days. The granularity is 1 day and on days when there are sales of an item, I'll average the prices to get that day's average. There will be days where no sales are made, and I'm suited that an adequate approximation can be used by pulling the previous and next occurrences of sales, and for each day between them having a price that transitions linearly from one to the other.

想象一下原始数据是:

Item   Date       Price
Bread  2000-01-01 10
Bread  2000-01-02 9.5
Bread  2000-01-04 9.1
Sugar  2000-01-01 100
Sugar  2000-01-11 150

我可以到这里:

Item   Date       Price
Bread  2000-01-01 10
Bread  2000-01-02 9.5
Bread  2000-01-03 NULL
Bread  2000-01-04 9.1
Sugar  2000-01-01 100
Sugar  2000-01-02 NULL
Sugar  2000-01-03 NULL
Sugar  2000-01-04 NULL
Sugar  2000-01-05 NULL
Sugar  2000-01-06 NULL
Sugar  2000-01-07 NULL
Sugar  2000-01-08 NULL
Sugar  2000-01-09 NULL
Sugar  2000-01-10 NULL
Sugar  2000-01-11 150

我想去的地方是:

Item   Date       Price
Bread  2000-01-01 10
Bread  2000-01-02 9.5
Bread  2000-01-03 9.3 --being 9.5 + ((9.1 - 9.5 / 2) * 1)
Bread  2000-01-04 9.1
Sugar  2000-01-01 100
Sugar  2000-01-02 105 --being 100 + (150 - 100 / 10) * 1)
Sugar  2000-01-03 110 --being 100 + (150 - 100 / 10) * 2)
Sugar  2000-01-04 115
Sugar  2000-01-05 120
Sugar  2000-01-06 125
Sugar  2000-01-07 130
Sugar  2000-01-08 135
Sugar  2000-01-09 140
Sugar  2000-01-10 145 --being 100 + (150 - 100 / 10) * 9)
Sugar  2000-01-11 150

到目前为止我尝试了什么?仅思考;我打算做这样的事情:

What have I tried so far? Thinking only; I'm planning on doing something like:

  • 提取原始数据
  • 加入数字/日历表以填充稀疏数据
  • LAST_VALUE()(或第一个?)OVER ROWS UNBOUNDED PRECEDING/FOLLOWING(带有 nulls-last order 子句)从原始数据中获取第一个非 null 的previous_date、following_date、previous_price 和following_price
  • DATEDIFF 假日期和previous_date 以获得天数(这实际上是我们跨越差距的距离,gap_progress)和差距距离(following_date -previous_date)
  • 获取公式的下一个价格、上一个价格和差距距离 (preceding_price + ((next_price -previous_price)/gap_distance) * gap_progress)

然而,我想知道是否有更简单的方法,因为我有数百万个项目日,但感觉效率不高..

I am, however, wondering if there's a simpler way, because I've got millions of item-days and this doesn't feel like it'll be that efficient..

我发现了很多问题示例,其中最后一行或下一行的数据被逐字涂抹以填补空白,但我不记得看到这种尝试进行某种转换的情况.也许这种技术可以双重应用,通过向前运行的涂抹,复制最近的值,并在其旁边进行向后运行的涂抹:

I find lots of examples of questions where the data from the last or next row is smeared verbatim to fill in the gaps, but I don't recall seeing this situation where some kind of transition is attempted. Perhaps this technique can be doubly applied, by having a smear that runs forwards, replicating the most recent value and alongside it a smear that runs backwards:

Item   Date       DateFwd    DateBak     PriceF PriceB
Bread  2000-01-01 2000-01-01 2000-01-01  10     10
Bread  2000-01-02 2000-01-02 2000-01-02  9.5    9.5
Bread  2000-01-03 2000-01-02 2000-01-04  9.5    9.1
Bread  2000-01-04 2000-01-04 2000-01-04  9.1    9.1
Sugar  2000-01-01 2000-01-01 2000-01-01  100    100
Sugar  2000-01-02 2000-01-01 2000-01-11  100    150
Sugar  2000-01-03 2000-01-01 2000-01-11  100    150
Sugar  2000-01-04 2000-01-01 2000-01-11  100    150
Sugar  2000-01-05 2000-01-01 2000-01-11  100    150
Sugar  2000-01-06 2000-01-01 2000-01-11  100    150
Sugar  2000-01-07 2000-01-01 2000-01-11  100    150
Sugar  2000-01-08 2000-01-01 2000-01-11  100    150
Sugar  2000-01-09 2000-01-01 2000-01-11  100    150
Sugar  2000-01-10 2000-01-01 2000-01-11  100    150
Sugar  2000-01-11 2000-01-11 2000-01-11  150    150

这些可能为公式提供必要的数据(preceding_price + ((next_price -previous_price)/gap_distance) * gap_progress):

These might provide the necessary data for the formula (preceding_price + ((next_price - preceding_price)/gap_distance) * gap_progress):

  • gap_distance = DATEDIFF(day, DateFwd, DateBak)
  • gap_progress = DATEDIFF(day, Date, DateFwd)
  • next_price = PriceB
  • preceding_price = PriceF

?

这是我知道可以访问的数据的 DDL(原始数据与日历表连接)

Here's a DDL of the data that I know I can get to (raw data joined with calendar table)

CREATE TABLE Data
([I] varchar(5), [D] date, [P] DECIMAL(10,5))
;

INSERT Data
([I], [D], [P])
VALUES
('Bread', '2000-01-01', 10),
('Bread', '2000-01-02', 9.5),
('Bread', '2000-01-04', 9.1),
('Sugar', '2000-01-01', 100),
('Sugar', '2000-01-11', 150);

CREATE TABLE Cal([D] DATE);
INSERT Cal VALUES
('2000-01-01'),
('2000-01-02'),
('2000-01-03'),
('2000-01-04'),
('2000-01-05'),
('2000-01-06'),
('2000-01-07'),
('2000-01-08'),
('2000-01-09'),
('2000-01-10'),
('2000-01-11');

SELECT d.i as [item], c.d as [date], d.p as [price] FROM
cal c LEFT JOIN data d ON c.d = d.d

推荐答案

更容易将那些缺失的差距连同价格一起生成

it is easier to generate those missing gap together with the Price in one go

所以我从你的原始原始数据开始

So i start off with your original raw data

CREATE TABLE t
    ([I] varchar(5), [D] date, [P] DECIMAL(10,2))
;

INSERT INTO t
    ([I], [D], [P])
VALUES
    ('Bread', '2000-01-01 00:00:00', '10'),
    ('Bread', '2000-01-02 00:00:00', '9.5'),
    ('Bread', '2000-01-04 00:00:00', '9.1'),
    ('Sugar', '2000-01-01 00:00:00', '100'),
    ('Sugar', '2000-01-11 00:00:00', '150');

; with
-- number is a tally table. here i use recursive cte to generate 100 numbers
number as
(
    select  n = 0
    union all
    select  n = n + 1
    from    number
    where   n < 99
),
-- a cte to get the Price of next date and also day diff
cte as
(
    select  *, 
            nextP = lead(P) over(partition by I order by D),
            cnt = datediff(day, D, lead(D) over(partition by I order by D)) - 1
    from    t
) 
select  I, 
        D = dateadd(day, n, D), 
        P = coalesce(c.P + (c.nextP - c.P) / ( cnt + 1) * n, c.P)
from    cte c
        cross join number n
where   n.n <= isnull(c.cnt, 0)

drop table t

这篇关于使用与周围行数据的间隙距离成正比的值填充数据中的间隙?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆