如何在给定时间段内执行基于R时间的重采样，等同于使用 pandas 的“重采样”功能？ [英] How to perform R time based resampling with a given time period equivalently to using pandas 'resample' functions?

查看：71 发布时间：2020/10/19 0:17:35 r datetime time-series

本文介绍了如何在给定时间段内执行基于R时间的重采样，等同于使用 pandas 的“重采样”功能？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试找到一种与下面的熊猫操作等效的重新采样操作：

示例原始数据帧df：

  FT 
时间
 2017-03-18 23:30:00 73.9 
 2017-03-18 23： 31:00 73.5 
 2017-03-18 23:32:00 71.6 
 2017-03-18 23:33:00 71.3 
 2017-03-18 23:34:00 72.3 
 2017-03-18 23:35:00 72.1 
 2017-03-18 23:36:00 70.1 
 2017-03-18 23:37:00 67.9 
 2017- 03-18 23:38:00 65.4 
 2017-03-18 23:39:00 63.4 
 2017-03-18 23:40:00 61.3 
 2017-03-18 23： 41:00 59.9 
 2017-03-18 23:42:00 58.4 
 2017-03-18 23:43:00 58.4 
 2017-03-18 23:44:00 55.6 
 2017-03-18 23:45:00 54.3 
 2017-03-18 23:46:00 54.3 
 2017-03-18 23:47:00 53.0 
 2017- 03-18 23:48:00 51.9 
 2017-03-18 23:49:00 50.8 
 2017-03-18 23:50:00 49.8 
 2017-03-18 23： 51:00 48.9 
 2017-03-18 23:52:00 47.6 
 2017-03-18 23:53:00 44.5 
 2017-03-18 23:54 ：00 57.2 
 2017-03-18 23:55:00 61.6 
 2017-03-18 23:56:00 59.8 
 2017-03-18 23:57:00 58.0 
 2017-03-18 23:58:00 56.2 
 2017-03-18 23:59:00 56.2

重采样：

  date_format ='％d-％b-％Y％H：％M： ％S'
 df.index = pd.to_datetime（df.index，format = date_format）
 df = df.resample（'5Min'）。mean（）

输出：

  FT 
时间
 2017-03-18 23:30:00 72.52 
 2017-03-18 23:35:00 67.78 
 2017-03-18 23:40:00 58.72 
 2017-03-18 23:45:00 52.86 
 2017-03-18 23:50:00 49.60 
 2017-03-18 23:55:00 58.36

我想知道使用给定的聚合函数对数据帧进行重新采样的最简单方法（例如平均值，总和等）和给定的采样时间。在Pandas中，我知道不使用插值法，并且重采样函数执行分组操作。

我猜测可以通过以下方式完成对日期时间的转换：

  df $ Time = strptime（df $ Time，％d-％b-％Y％H：％M：％ S）

但我不确定应该为重采样操作本身使用哪个R库。 / p>

谢谢

编辑：

使用阅读器read_csv我获得了

 ＃小声：43,981×6 
时间Power Tin FT RT Flow 
 *< ; chr> < dbl> < dbl> < dbl> < dbl> < dbl> 
 1 2017年2月16日11:00:00 0.09 18.87 57.9 53.3 17 
 2 2017年2月16日11:01:00 0.09 18.87 57.9 53.3 17 
 2017年2月16日11:02:00 0.09 18.87 57.9 53.3 17 
 4 2017年2月16日11:03:00 0.09 18.87 57.9 53.3 17 
 5 2017年2月16日11:04:00 0.09 18.87 57.9 53.3 17 
 6 2017年2月16日11:05:00 0.09 18.87 57.9 53.3 17 
 7 2017年2月16日11:06:00 0.09 18.87 57.9 53.3 17 
 2017年2月16日11:07:00 0.09 18.87 57.9 53.3 17 
 9 2017年2月16日11:08:00 0.09 18.87 57.9 53.3 17 
 2017年2月16日11:09:00 0.09 18.87 57.9 53.3 17 
＃...还有43,971行

但是

  df％>％增厚（ 5分钟）％&％;％group_by（Time_5_min）％>％摘要（mean（FT）） b

出现以下错误：

 错误：x不包含类Date，POSIXct或POSIXlt的变量。
追溯：

更新：

@Edwin提供的解决方案效果很好

我使用以下转换为日期时间的方法。

  df $ Time = as.POSIXct（df $ Time，format =％d-％b-％Y％H：％M：％S）

解决方案

使用 dplyr 和 padr 。（这假定 Time 是日期时间变量，如果您使用 reader 中的函数，它将是datetime变量。）

  library（dplyr）;库（padr）
 dt $ Time<-任何时间:: anytime（dt $ Time）
 dt％>％加厚（ 5分钟）％&％;％group_by（Time_5_min）％> ％summarise（mean（FT））

I am trying to find a way to do the equivalent re-sampling action as the pandas manipulation below:

example original dataframe df:

                      FT
Time                     
2017-03-18 23:30:00  73.9
2017-03-18 23:31:00  73.5
2017-03-18 23:32:00  71.6
2017-03-18 23:33:00  71.3
2017-03-18 23:34:00  72.3
2017-03-18 23:35:00  72.1
2017-03-18 23:36:00  70.1
2017-03-18 23:37:00  67.9
2017-03-18 23:38:00  65.4
2017-03-18 23:39:00  63.4
2017-03-18 23:40:00  61.3
2017-03-18 23:41:00  59.9
2017-03-18 23:42:00  58.4
2017-03-18 23:43:00  58.4
2017-03-18 23:44:00  55.6
2017-03-18 23:45:00  54.3
2017-03-18 23:46:00  54.3
2017-03-18 23:47:00  53.0
2017-03-18 23:48:00  51.9
2017-03-18 23:49:00  50.8
2017-03-18 23:50:00  49.8
2017-03-18 23:51:00  48.9
2017-03-18 23:52:00  47.6
2017-03-18 23:53:00  44.5
2017-03-18 23:54:00  57.2
2017-03-18 23:55:00  61.6
2017-03-18 23:56:00  59.8
2017-03-18 23:57:00  58.0
2017-03-18 23:58:00  56.2
2017-03-18 23:59:00  56.2

resampling:

date_format= '%d-%b-%Y %H:%M:%S'
df.index=pd.to_datetime(df.index,format=date_format)
df=df.resample('5Min').mean()

Output:

                  FT
Time                      
2017-03-18 23:30:00  72.52
2017-03-18 23:35:00  67.78
2017-03-18 23:40:00  58.72
2017-03-18 23:45:00  52.86
2017-03-18 23:50:00  49.60
2017-03-18 23:55:00  58.36

I would like to know the simplest way to resample a dataframe using a given aggregate function (e.g. mean, sum etc.) and a given sampling time. in Pandas, I understand interpolation is not used and the resample function performs a 'group by' manipulation.

I am guessing the conversion to a datetime could be done this way:

df$Time=strptime(df$Time,"%d-%b-%Y %H:%M:%S")

but I am not sure which R library I should use for the resample action itself.

Thank you

edit:

using readr read_csv I obtain

# A tibble: 43,981 × 6
                   Time Power   Tin    FT    RT  Flow
*                 <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1  16-Feb-2017 11:00:00  0.09 18.87  57.9  53.3    17
2  16-Feb-2017 11:01:00  0.09 18.87  57.9  53.3    17
3  16-Feb-2017 11:02:00  0.09 18.87  57.9  53.3    17
4  16-Feb-2017 11:03:00  0.09 18.87  57.9  53.3    17
5  16-Feb-2017 11:04:00  0.09 18.87  57.9  53.3    17
6  16-Feb-2017 11:05:00  0.09 18.87  57.9  53.3    17
7  16-Feb-2017 11:06:00  0.09 18.87  57.9  53.3    17
8  16-Feb-2017 11:07:00  0.09 18.87  57.9  53.3    17
9  16-Feb-2017 11:08:00  0.09 18.87  57.9  53.3    17
10 16-Feb-2017 11:09:00  0.09 18.87  57.9  53.3    17
# ... with 43,971 more rows

but

df %>% thicken("5 min") %>% group_by(Time_5_min) %>% summarise(mean(FT))

gives the following error:

"Error: x does not contain a variable of class Date, POSIXct, or POSIXlt.
Traceback:"

update:

the solution given by @Edwin works well

I used the following conversion to datetime.

df$Time=as.POSIXct(df$Time, format="%d-%b-%Y %H:%M:%S")

解决方案

Using dplyr and padr. (This is assumes that Time is a datetime variable, which it will be if you use a function from readr.)

library(dplyr); library(padr)
dt$Time <- anytime::anytime(dt$Time)
dt %>% thicken("5 min") %>% group_by(Time_5_min) %>% summarise(mean(FT))

这篇关于如何在给定时间段内执行基于R时间的重采样，等同于使用 pandas 的“重采样”功能？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在给定时间段内执行基于R时间的重采样，等同于使用 pandas 的“重采样”功能？ [英] How to perform R time based resampling with a given time period equivalently to using pandas 'resample' functions?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在给定时间段内执行基于R时间的重采样，等同于使用 pandas 的“重采样”功能？ [英] How to perform R time based resampling with a given time period equivalently to using pandas &#39;resample&#39; functions?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何在给定时间段内执行基于R时间的重采样，等同于使用 pandas 的“重采样”功能？ [英] How to perform R time based resampling with a given time period equivalently to using pandas 'resample' functions?

登录关闭