BigQuery,使用线性插值填充缺失值 [英] BigQuery, fill missing values with Linear interpolation

查看:57
本文介绍了BigQuery,使用线性插值填充缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Bigquery中有一个表格,该表格每30分钟显示一次数据,我想每5分钟显示一次数据,目前我正在使用此查询将现有值填充为空值

I have a table in Bigquery with data every 30 minutes, I want to show the data every 5 minutes, currently I am using this query to fill the null values with the existing values

SELECT
SETTLEMENTDATE,DUID,
LAST_VALUE(SCADAVALUE ignore nulls) OVER (
    PARTITION BY DUID ORDER BY SETTLEMENTDATE) AS SCADAVALUE from x

相反,可以进行线性插值吗?

instead, is it possible to do Linear interpolation, something like this

我的列结算日期为5分钟,SCADAVALUEORIGIN列的值非常为30分钟,否则为空,我想添加一列SCADAINTERPOLATION,该值在30的两个值之间均匀分布分钟,另一个问题是,当我每5分钟刷新一次数据时,最后一个值将在(5,10,15,20,25)分钟内显示为空,我希望我的解释很清楚

I have the column settlement date which is by 5 minutes, the column SCADAVALUEORIGIN Which has a value very 30 minutes, otherwise it is null, I want to add a column SCADAINTERPOLATION, which spread the values evenly between two the values of 30 minute, another issue is, as I refresh the data every 5 minutes, the last value will show null for (5,10,15,20,25) minutes, I hope, my explanation is clear

推荐答案

以下是BigQuery标准SQL

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  TIMESTAMP_ADD(SETTLEMENTDATE, INTERVAL 5 * i MINUTE) AS SETTLEMENTDATE, 
  IF(i = 0, SCADAVALUEORIGIN, NULL) AS SCADAVALUEORIGIN,
  SCADAVALUEORIGIN AS SCADAVALUE,
  ROUND(SCADAVALUEORIGIN + IFNULL((next_value - SCADAVALUEORIGIN) / 6 * i, 0), 3) AS SCADAINTERPOLATION
FROM (
  SELECT SETTLEMENTDATE, SCADAVALUEORIGIN, 
    LEAD(SCADAVALUEORIGIN) OVER(ORDER BY SETTLEMENTDATE) next_value,
  FROM `project.dataset.table`
), UNNEST(GENERATE_ARRAY(0, 5)) i 

如果要应用于您的问题的样本数据-结果为

if to apply to sample data from your question - result is

这篇关于BigQuery,使用线性插值填充缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆