如何在R中每n分钟对时间进行分组 [英] How to group time by every n minutes in R

查看:196
本文介绍了如何在R中每n分钟对时间进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个时间序列很多的数据框:

I have a dataframe with a lot of time series:

1   0:03    B   1
2   0:05    A   1
3   0:05    A   1
4   0:05    B   1
5   0:10    A   1
6   0:10    B   1
7   0:14    B   1
8   0:18    A   1
9   0:20    A   1
10  0:23    B   1
11  0:30    A   1

我想将时间序列分为每6分钟一次,并计算A和B的频率:

I want to group the time series into every 6 minutes and count the frequency of A and B:

1   0:06    A   2
2   0:06    B   2
3   0:12    A   1
4   0:12    B   1
5   0:18    A   1
6   0:24    A   1
7   0:24    B   1
8   0:18    A   1
9   0:30    A   1

此外,时间序列的类别是字符。我该怎么办?

Also, the class of the time series is character. What should I do?

推荐答案

这是将时间转换为 POSIXct ,将减少时间6分钟,然后 count



首先,您需要指定数据的年,月,日,小时,分钟和秒。

Here's an approach to convert times to POSIXct, cut the times by 6 minute intervals, then count.

First, you need to specify the year, month, day, hour, minute, and seconds of your data. This will help with scaling it to larger datasets.

library(tidyverse)
library(lubridate)

# sample data
d <- data.frame(t = paste0("2019-06-02 ", 
                           c("0:03","0:06","0:09","0:12","0:15",
                             "0:18","0:21","0:24","0:27","0:30"), 
                           ":00"),
                g = c("A","A","B","B","B"))

d$t <- ymd_hms(d$t) # convert to POSIXct with `lubridate::ymd_hms()`

如果您检查新日期列的,您将看到它是 POSIXct。

If you check the class of your new date column, you will see it is "POSIXct".

> class(d$t)
[1] "POSIXct" "POSIXt" 

现在数据在 POSIXct中,您可以按分钟间隔 cut !我们会将这个新的分组因子添加到名为 tc 的新列中。

Now that the data is in "POSIXct", you can cut it by minute intervals! We will add this new grouping factor as a new column called tc.

d$tc <- cut(d$t, breaks = "6 min")  
d
                     t g                  tc
1  2019-06-02 00:03:00 A 2019-06-02 00:03:00
2  2019-06-02 00:06:00 A 2019-06-02 00:03:00
3  2019-06-02 00:09:00 B 2019-06-02 00:09:00
4  2019-06-02 00:12:00 B 2019-06-02 00:09:00
5  2019-06-02 00:15:00 B 2019-06-02 00:15:00
6  2019-06-02 00:18:00 A 2019-06-02 00:15:00
7  2019-06-02 00:21:00 A 2019-06-02 00:21:00
8  2019-06-02 00:24:00 B 2019-06-02 00:21:00
9  2019-06-02 00:27:00 B 2019-06-02 00:27:00
10 2019-06-02 00:30:00 B 2019-06-02 00:27:00

现在您可以 group_by 这个新间隔( tc )和您的分组列( g )并计数发生的频率。获取组中观察的频率是相当常见的操作,因此 dplyr 为此提供了 count

Now you can group_by this new interval (tc) and your grouping column (g), and count the frequency of occurences. Getting the frequency of observations in a group is a fairly common operation, so dplyr provides count for this:

count(d, g, tc)
# A tibble: 7 x 3
  g     tc                      n
  <fct> <fct>               <int>
1 A     2019-06-02 00:03:00     2
2 A     2019-06-02 00:15:00     1
3 A     2019-06-02 00:21:00     1
4 B     2019-06-02 00:09:00     2
5 B     2019-06-02 00:15:00     1
6 B     2019-06-02 00:21:00     1
7 B     2019-06-02 00:27:00     2

如果在控制台中运行?dplyr :: count(),您会看到 count(d,tc)是只是 group_by(d,g,tc)%>%summarise(n = n())的包装。

If you run ?dplyr::count() in the console, you'll see that count(d, tc) is simply a wrapper for group_by(d, g, tc) %>% summarise(n = n()).

这篇关于如何在R中每n分钟对时间进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆