如何在R中按每n分钟分组时间 [英] How to group time by every n minutes in R
问题描述
我有一个包含很多时间序列的数据框:
I have a dataframe with a lot of time series:
1 0:03 B 1
2 0:05 A 1
3 0:05 A 1
4 0:05 B 1
5 0:10 A 1
6 0:10 B 1
7 0:14 B 1
8 0:18 A 1
9 0:20 A 1
10 0:23 B 1
11 0:30 A 1
我想将时间序列分组为每 6 分钟一次,并计算 A 和 B 的频率:
I want to group the time series into every 6 minutes and count the frequency of A and B:
1 0:06 A 2
2 0:06 B 2
3 0:12 A 1
4 0:12 B 1
5 0:18 A 1
6 0:24 A 1
7 0:24 B 1
8 0:18 A 1
9 0:30 A 1
另外,时间序列的类是字符.我该怎么办?
Also, the class of the time series is character. What should I do?
推荐答案
这里有一种方法将时间转换为 POSIXct
,cut
时间间隔为 6 分钟,然后 <代码>计数.
首先,您需要指定数据的年、月、日、小时、分钟和秒.这将有助于将其扩展到更大的数据集.
Here's an approach to convert times to POSIXct
, cut
the times by 6 minute intervals, then count
.
First, you need to specify the year, month, day, hour, minute, and seconds of your data. This will help with scaling it to larger datasets.
library(tidyverse)
library(lubridate)
# sample data
d <- data.frame(t = paste0("2019-06-02 ",
c("0:03","0:06","0:09","0:12","0:15",
"0:18","0:21","0:24","0:27","0:30"),
":00"),
g = c("A","A","B","B","B"))
d$t <- ymd_hms(d$t) # convert to POSIXct with `lubridate::ymd_hms()`
如果您检查新日期列的 class
,您会看到它是POSIXct".
If you check the class
of your new date column, you will see it is "POSIXct".
> class(d$t)
[1] "POSIXct" "POSIXt"
现在数据在POSIXct"中,您可以按分钟间隔剪切
它!我们将把这个新的分组因子添加为一个名为 tc
的新列.
Now that the data is in "POSIXct", you can cut
it by minute intervals! We will add this new grouping factor as a new column called tc
.
d$tc <- cut(d$t, breaks = "6 min")
d
t g tc
1 2019-06-02 00:03:00 A 2019-06-02 00:03:00
2 2019-06-02 00:06:00 A 2019-06-02 00:03:00
3 2019-06-02 00:09:00 B 2019-06-02 00:09:00
4 2019-06-02 00:12:00 B 2019-06-02 00:09:00
5 2019-06-02 00:15:00 B 2019-06-02 00:15:00
6 2019-06-02 00:18:00 A 2019-06-02 00:15:00
7 2019-06-02 00:21:00 A 2019-06-02 00:21:00
8 2019-06-02 00:24:00 B 2019-06-02 00:21:00
9 2019-06-02 00:27:00 B 2019-06-02 00:27:00
10 2019-06-02 00:30:00 B 2019-06-02 00:27:00
现在您可以group_by
这个新区间 (tc
) 和您的分组列 (g
),并计算出现频率.获取组中的观察频率是一个相当常见的操作,因此 dplyr
为此提供了 count
:
Now you can group_by
this new interval (tc
) and your grouping column (g
), and count the frequency of occurences. Getting the frequency of observations in a group is a fairly common operation, so dplyr
provides count
for this:
count(d, g, tc)
# A tibble: 7 x 3
g tc n
<fct> <fct> <int>
1 A 2019-06-02 00:03:00 2
2 A 2019-06-02 00:15:00 1
3 A 2019-06-02 00:21:00 1
4 B 2019-06-02 00:09:00 2
5 B 2019-06-02 00:15:00 1
6 B 2019-06-02 00:21:00 1
7 B 2019-06-02 00:27:00 2
如果您在控制台中运行 ?dplyr::count()
,您会看到 count(d, tc)
只是 的包装器group_by(d, g, tc) %>% summarise(n = n())
.
If you run ?dplyr::count()
in the console, you'll see that count(d, tc)
is simply a wrapper for group_by(d, g, tc) %>% summarise(n = n())
.
这篇关于如何在R中按每n分钟分组时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!