数据框中列的时间平均(滑动窗口) [英] Time-based averaging (sliding window) of columns in a data.frame

查看:285
本文介绍了数据框中列的时间平均(滑动窗口)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.frame有多列。其中一列是时间,因此是不减少的。其余列包含在数据框架的某行中指定的时间给出的时间记录的观察。



我想选择一个时间窗口x秒,并计算该窗口的相同数据框架中某些其他列中的条目的平均值(或任何函数)。



当然,由于它是一个基于时间的平均值,窗口中的条目数可以根据数据而变化。这是因为属于某个时间窗口的行数可以有所不同。



我已经使用自定义函数完成了此操作,它在data.frame中创建了一个新列。新列为一个时间窗口中的所有条目分配一个数字。该号码在所有时间窗口都是唯一的。这基本上根据时间窗口将数据分成组。然后我使用R的聚合函数来计算平均值。



我只是想知道是否存在可以根据时间间隔进行分组的现有R函数或者如果有更好的(更干净)的方式来做到这一点。

解决方案

假设你的 data.frame 仅包含数字数据,这是使用zoo / xts执行此操作的一种方法:

 >数据<  -  data.frame(Time = Sys.time()+ 1:20,x = rnorm(20))
> xData< - xts(Data [, - 1],Data [,1])$ ​​b $ b> period.apply(xData,endpoints(xData,seconds,5),colMeans)
[,1]
2010-10-20 13:34:19 -0.20725660
2010-10 -20 13:34:24 -0.01219346
2010-10-20 13:34:29 -0.70717312
2010-10-20 13:34:34 0.09338097
2010-10-20 13 :34:38 -0.22330363

编辑:仅使用base R包。手段是一样的,但时间稍有不同,因为端点在第一次观察时开始5秒的间隔。下面的代码以5秒钟的间隔从秒= 0开始。

 > nSeconds<  -  5 
> agg< - aggregate(Data [, - 1],by = list(as.numeric(Data $ Time)%/%nSeconds),mean)
> agg [,1]< - .POSIXct(agg [,1] * nSeconds)#= R-2.12.0 .POSIXct


I have a data.frame which has multiple columns. One of the columns is time and is thus non-decreasing. Rest of the columns contain observations recorded at the time given by the time specified in a certain row of the data.frame.

I want to select a window of time, say "x" seconds, and calculate the average (or for that matter any function) of the entries in some other columns in the same data.frame for that window.

Of course, because its a time based average, the number of entries in a window can vary depending upon the data. This is because the number of rows belonging to a certain time window can vary.

I have done this using a custom function, which creates a new column in the data.frame. The new column assigns a single number to all the entries in a time window. The number is unique across all the time windows. This essentially divides the data into groups based on the time windows. Then I use R's "aggregate" function to do calculate the mean.

I was just wondering if there is an existing R function that can do the grouping based on a time interval or if there is a better (cleaner) way to do this.

解决方案

Assuming your data.frame contains only numeric data, this is one way to do it using zoo/xts:

> Data <- data.frame(Time=Sys.time()+1:20,x=rnorm(20))
> xData <- xts(Data[,-1], Data[,1])
> period.apply(xData, endpoints(xData, "seconds", 5), colMeans)
                           [,1]
2010-10-20 13:34:19 -0.20725660
2010-10-20 13:34:24 -0.01219346
2010-10-20 13:34:29 -0.70717312
2010-10-20 13:34:34  0.09338097
2010-10-20 13:34:38 -0.22330363

EDIT: using only base R packages. The means are the same, but the times are slightly different because endpoints starts the 5-second interval with the first observation. The code below groups on 5-second intervals starting with seconds = 0.

> nSeconds <- 5
> agg <- aggregate(Data[,-1], by=list(as.numeric(Data$Time) %/% nSeconds), mean)
> agg[,1] <- .POSIXct(agg[,1]*nSeconds)  # >= R-2.12.0 required for .POSIXct

这篇关于数据框中列的时间平均(滑动窗口)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆