如何在R中按每n分钟分组时间 [英] How to group time by every n minutes in R

查看:16
本文介绍了如何在R中按每n分钟分组时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含很多时间序列的数据框:

I have a dataframe with a lot of time series:

1   0:03    B   1
2   0:05    A   1
3   0:05    A   1
4   0:05    B   1
5   0:10    A   1
6   0:10    B   1
7   0:14    B   1
8   0:18    A   1
9   0:20    A   1
10  0:23    B   1
11  0:30    A   1

我想将时间序列分组为每 6 分钟一次,并计算 A 和 B 的频率:

I want to group the time series into every 6 minutes and count the frequency of A and B:

1   0:06    A   2
2   0:06    B   2
3   0:12    A   1
4   0:12    B   1
5   0:18    A   1
6   0:24    A   1
7   0:24    B   1
8   0:18    A   1
9   0:30    A   1

另外,时间序列的类是字符.我该怎么办?

Also, the class of the time series is character. What should I do?

推荐答案

这里有一种方法将时间转换为 POSIXctcut 时间间隔为 6 分钟,然后 <代码>计数.

首先,您需要指定数据的年、月、日、小时、分钟和秒.这将有助于将其扩展到更大的数据集.

Here's an approach to convert times to POSIXct, cut the times by 6 minute intervals, then count.

First, you need to specify the year, month, day, hour, minute, and seconds of your data. This will help with scaling it to larger datasets.

library(tidyverse)
library(lubridate)

# sample data
d <- data.frame(t = paste0("2019-06-02 ", 
                           c("0:03","0:06","0:09","0:12","0:15",
                             "0:18","0:21","0:24","0:27","0:30"), 
                           ":00"),
                g = c("A","A","B","B","B"))

d$t <- ymd_hms(d$t) # convert to POSIXct with `lubridate::ymd_hms()`

如果您检查新日期列的 class,您会看到它是POSIXct".

If you check the class of your new date column, you will see it is "POSIXct".

> class(d$t)
[1] "POSIXct" "POSIXt" 

现在数据在POSIXct"中,您可以按分钟间隔剪切它!我们将把这个新的分组因子添加为一个名为 tc 的新列.

Now that the data is in "POSIXct", you can cut it by minute intervals! We will add this new grouping factor as a new column called tc.

d$tc <- cut(d$t, breaks = "6 min")  
d
                     t g                  tc
1  2019-06-02 00:03:00 A 2019-06-02 00:03:00
2  2019-06-02 00:06:00 A 2019-06-02 00:03:00
3  2019-06-02 00:09:00 B 2019-06-02 00:09:00
4  2019-06-02 00:12:00 B 2019-06-02 00:09:00
5  2019-06-02 00:15:00 B 2019-06-02 00:15:00
6  2019-06-02 00:18:00 A 2019-06-02 00:15:00
7  2019-06-02 00:21:00 A 2019-06-02 00:21:00
8  2019-06-02 00:24:00 B 2019-06-02 00:21:00
9  2019-06-02 00:27:00 B 2019-06-02 00:27:00
10 2019-06-02 00:30:00 B 2019-06-02 00:27:00

现在您可以group_by 这个新区间 (tc) 和您的分组列 (g),并计算出现频率.获取组中的观察频率是一个相当常见的操作,因此 dplyr 为此提供了 count:

Now you can group_by this new interval (tc) and your grouping column (g), and count the frequency of occurences. Getting the frequency of observations in a group is a fairly common operation, so dplyr provides count for this:

count(d, g, tc)
# A tibble: 7 x 3
  g     tc                      n
  <fct> <fct>               <int>
1 A     2019-06-02 00:03:00     2
2 A     2019-06-02 00:15:00     1
3 A     2019-06-02 00:21:00     1
4 B     2019-06-02 00:09:00     2
5 B     2019-06-02 00:15:00     1
6 B     2019-06-02 00:21:00     1
7 B     2019-06-02 00:27:00     2

如果您在控制台中运行 ?dplyr::count(),您会看到 count(d, tc) 只是 的包装器group_by(d, g, tc) %>% summarise(n = n()).

If you run ?dplyr::count() in the console, you'll see that count(d, tc) is simply a wrapper for group_by(d, g, tc) %>% summarise(n = n()).

这篇关于如何在R中按每n分钟分组时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆