GCP DataPrep-移动窗口 [英] GCP DataPrep- moving window

查看:164
本文介绍了GCP DataPrep-移动窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个以下格式的CSV文件,试图与GCP dataprep冲突.

I have a CSV file of the following format that I am trying to wrangle with GCP dataprep.

   Timestamp                 Tag           Value
   2018-05-01 09:00:00     Temperature     40.1
   2018-05-01 09:00:00     Humidity        80
   2018-05-01 09:05:00     Temperature     40.2
   2018-05-01 09:05:00     Humidity        80
   2018-05-01 09:10:00     Temperature     40.0
   2018-05-01 09:10:00     Humidity        82

数据以5分钟为间隔延长2周. 我想对其进行转换,以便每隔10分钟显示一次前10分钟的平均值(或最小值/最大值/中位数),并对其进行旋转,以使最终结果如下:

The data extends in 5 minutes interval for 2 weeks. I would like to transform it such that at each 10 minute interval, I am displaying the average(or min/max/median) of the previous 10 minutes and also pivot it,so that the end result I get is as follows:

  Timestamp             Temperature    Humidity
 2018-05-01 09:10:00      40.1         80.06

从本质上讲,09:10的值是09:00、09:05和09:10的平均值.

So essentially, the value at 09:10 is the average of the values at 09:00, 09:05 and 09:10.

09:25的值将是09:15,09:20,09:25的平均值.

And the value at 09:25 would be the average at 09:15,09:20,09:25.

我已经尝试了窗口函数和聚合,但是似乎无法使其正常工作.

I have tried window functions and aggregate, but seem to be unable to get it to work.

谢谢您的输入!

推荐答案

我将从重新格式化每行格式为的数据开始

I would start by reformatting the data were each line is of the form:

  Timestamp                Temperature      Humidity 
  2018-05-01 09:00:00      40.1             80
  2018-05-01 09:05:00      40.2             80
  2018-05-01 09:10:00      40               82
  2018-05-01 09:15:00      41               81
  2018-05-01 09:20:00      40.8             81
  2018-05-01 09:25:00      40.2             84

这将使数据大小减少一半,并减少正在使用的存储空间量.您可以在Dataprep中使用将值转换为列" 函数.这将为温度创建一个列,为湿度创建一个列.

This will reduce the size of your data in half and reduce the amount of storage space you are utilizing. You can use the ‘convert values to columns’ function in Dataprep. This will create a column for Temperature and one for Humidity.

采用该格式后,您就可以使用窗口函数 ROLLINGAVERAGE 计算平均值.通过设置以下参数,使函数计算在其前面的两行的平均值:

Once in that format, you can use the window function ROLLINGAVERAGE to compute the averages. Make the function compute the average for the two rows which precedes it by setting the following parameters:

rollingaverage(Temperature, 2, 0)

第一个参数是列名,第二个参数是前面的行,最后一个参数是后面的行的数目,在这些行上计算平均值.对湿度"列重复以上功能.结果应类似于以下内容:

The first argument is the column name, the second is the preceding rows, and the last one is the number of following rows on which the average is computed. Repeat the above function for the Humidity column. The results should resemble the following:

接下来,在过滤器行"选项下,使用固定间隔的行数" 函数,使第三行从第三个条目开始.设置参数如下:

Next, under the filter rows option, use the ‘rows at regular intervals’ function to keep every third row starting from the third entry. Set the parameter as follows:

删除两个原始的温度和湿度"行,您将获得以下内容:

Remove the two original Temperature and Humidity rows and you will get the following:

  Timestamp                Temperature      Humidity 
  2018-05-01 09:10:00      40.1             80.67
  2018-05-01 09:25:00      41               82

这篇关于GCP DataPrep-移动窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆