如何根据R中的datetime列对数据帧进行子采样 [英] How to subsample a data frame based on a datetime column in R

查看:144
本文介绍了如何根据R中的datetime列对数据帧进行子采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从datetime列以小时间隔对数据帧进行子采样,从数据帧第一行的时间值开始。我的数据框从第一行到最后一行以10分钟的间隔运行。示例数据如下:

  structure(list(datetime = structure(1:19,.Label = c(30/03 / 2011 05:09,
30/03/2011 05:19,30/03/2011 05:29,30/03/2011 05:39,30/03/2011 05:49,
30/03/2011 05:59,30/03/2011 06:09,30/03/2011 06:19,30/03/2011 06: 29,
30/03/2011 06:39,30/03/2011 06:49,30/03/2011 06:59,30/03/2011 07:09 ,
30/03/2011 07:19,30/03/2011 07:29,30/03/2011 07:39,30/03/2011 07:49,
30/03/2011 07:59,30/03/2011 08:09),class =factor),a_count = c(66L,
34L,33L,20L,12L ,44L,36L,29L,21L,22L,17L,38L,24L,19L,
60L,54L,27L,36L,45L),b_count = c(166.49,167.54,168.31,
168.81, 169.24,169.61,169.96,170.29,170.63,170.98,171.31,
171.62,171.94,172.29,172.68,173.15,173.71,174.34,174.99
)),.Names = c(datetime, a_count,b_count),class =data.frame,row.names = c(NA,
-19L))

df

  datetime a_count b_count 
1 30/09/2011 05:09 66 166.49
2 30/09/2011 05:19 34 167.54
3 30/09/2011 05:29 33 168.31
4 30/09/2011 05:39 20 168.81
5 30/09/2011 05:49 12 169.24
6 30/09/2011 05:59 44 169.61
7 30 / 09/2011 06:09 36 169.96
8 30/09/2011 06:19 29 170.29
9 30/09/2011 06:29 21 170.63
10 30/09/2011 06: 39 22 170.98
11 30/09/2011 06:49 17 171.31
12 30/09/2011 06:59 38 171.62
13 30/09/2011 07:09 24 171.94
14 30/09/2011 07:19 19 172.29
15 30/09/2011 07:29 60 172.68
16 30/09/2011 07:39 54 173.15
17 30 / 09/2011 07:49 27 173.71
18 30/09/2011 07:59 36 174.34
19 30/09/2011 08:09 45 174.99
/ pre>

我想最终得到以下数据框:

  datetime a_count b_count 
30/09/2011 05:09 66 166 .49
30/09/2011 06:09 36 169.96
30/09/2011 07:09 24 171.94
30/09/2011 08:09 45 174.99

任何建议将不胜感激!

解决方案

很难猜出你有什么结构。是否确保您在第一次值+ x时间60分钟时有一个值?如果找不到值会怎么样?如果当时有两个值,会发生什么。你需要大致匹配吗?说09:10被计算为09:09?



让您开始的想法如下:

 #我会调用你的数据框`d`。 
#将datetime转换为POSIXct对象,R的时间戳数据类型为
d $ datetime< - as.POSIXct(as.character(d $ datetime),format ='%d /%m /%Y% H:%M')
#提取分钟
d $ minute< - as.numeric(format(d $ datetime,'%M'))
#然后按相同的分钟选择。
subset(d,minute == d $ minute [1])$ ​​b $ b


I would like to subsample a data frame at hourly intervals from a datetime column, beginning with the time value in the first row of the data frame. My data frame runs at 10-minute intervals from the first to the last row. Example data is below:

structure(list(datetime = structure(1:19, .Label = c("30/03/2011 05:09", 
"30/03/2011 05:19", "30/03/2011 05:29", "30/03/2011 05:39", "30/03/2011 05:49", 
"30/03/2011 05:59", "30/03/2011 06:09", "30/03/2011 06:19", "30/03/2011 06:29", 
"30/03/2011 06:39", "30/03/2011 06:49", "30/03/2011 06:59", "30/03/2011 07:09", 
"30/03/2011 07:19", "30/03/2011 07:29", "30/03/2011 07:39", "30/03/2011 07:49", 
"30/03/2011 07:59", "30/03/2011 08:09"), class = "factor"), a_count = c(66L, 
34L, 33L, 20L, 12L, 44L, 36L, 29L, 21L, 22L, 17L, 38L, 24L, 19L, 
60L, 54L, 27L, 36L, 45L), b_count = c(166.49, 167.54, 168.31, 
168.81, 169.24, 169.61, 169.96, 170.29, 170.63, 170.98, 171.31, 
171.62, 171.94, 172.29, 172.68, 173.15, 173.71, 174.34, 174.99
)), .Names = c("datetime", "a_count", "b_count"), class = "data.frame", row.names = c(NA, 
-19L))

df

           datetime a_count b_count
1  30/09/2011 05:09      66  166.49
2  30/09/2011 05:19      34  167.54
3  30/09/2011 05:29      33  168.31
4  30/09/2011 05:39      20  168.81
5  30/09/2011 05:49      12  169.24
6  30/09/2011 05:59      44  169.61
7  30/09/2011 06:09      36  169.96
8  30/09/2011 06:19      29  170.29
9  30/09/2011 06:29      21  170.63
10 30/09/2011 06:39      22  170.98
11 30/09/2011 06:49      17  171.31
12 30/09/2011 06:59      38  171.62
13 30/09/2011 07:09      24  171.94
14 30/09/2011 07:19      19  172.29
15 30/09/2011 07:29      60  172.68
16 30/09/2011 07:39      54  173.15
17 30/09/2011 07:49      27  173.71
18 30/09/2011 07:59      36  174.34
19 30/09/2011 08:09      45  174.99

I would like to end up with the following data frame:

        datetime   a_count b_count
30/09/2011 05:09       66  166.49
30/09/2011 06:09       36  169.96
30/09/2011 07:09       24  171.94
30/09/2011 08:09       45  174.99

Any suggestions would be greatly appreciated!

解决方案

It is hard to guess what structure you have. Is it guaranteed that you have one value at exactly the first time value + x times 60 minutes? What happens if the value can not be found? What happens if you have two values at that time. Do you need approximate matching? Say, 09:10 is counted as 09:09?

On idea to get you started is the following:

# I will call your dataframe `d`. 
# Transform datetime to a POSIXct object, R's datatype for timestamps
d$datetime <- as.POSIXct(as.character(d$datetime), format='%d/%m/%Y %H:%M')
# Extract the minutes
d$minute <- as.numeric(format(d$datetime, '%M'))
# And select by identical minute.
subset(d, minute == d$minute[1])

这篇关于如何根据R中的datetime列对数据帧进行子采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆