根据R中的开始/结束时间绘制出现频率 [英] Plotting frequency of occurrences based on start/end times in R
问题描述
我有一个行程"数据集,其中包含唯一的行程ID,以及行程的开始和结束时间(特定的小时和分钟).这些旅行都是在同一天进行的.我正在尝试确定在任何给定时间的道路上的汽车数量,并使用R中的ggplot将其绘制为线形图.换句话说,汽车在开始和结束时间之间的任何时间都在道路上".
I have a "trips" dataset that includes a unique trip id, and a start and end time (the specific hour and minute) of the trips. These trips were all taken on the same day. I am trying to determine the number of cars on the road at any given time and plot it as a line graph using ggplot in R. In other words, a car is "on the road" at any time in between its start and end time.
我能找到的最相似的示例使用以下结构:
The most similar example I can find uses the following structure:
yearly_counts <- trips %>%
count(year, trip_id)
ggplot(data = yearly_counts, mapping = aes(x = year, y = n)) +
geom_line()
最好的方法是修改此结构,使其具有一个"minutesByHour_count"变量,该变量具有每小时的每分钟计数吗?对我来说,这似乎效率低下,但仍无法解决从开始/结束时间获取计数的问题.
Would the best approach be to modify this structure have an "minutesByHour_count" variable that has a count for every minute of every hour? This seems inefficient to me, and still doesn't solve the problem of getting the counts from the start/end time.
有没有更简单的方法?
推荐答案
下面是一个示例,该示例基于将每个起点算作一辆额外的汽车,并将每个终点算作计数的减少:
Here's an example based on counting each start as an additional car, and each end as a reduction in the count:
library(tidyverse)
df %>%
gather(type, time, c(start_hour, end_hour)) %>%
mutate(count_chg = if_else(type == "start_hour", 1, -1)) %>%
arrange(time) %>%
mutate(car_count = cumsum(count_chg)) %>%
ggplot(aes(time, car_count)) +
geom_step()
样本数据:
df <- data.frame(
uniqueID = 1:60,
start_hour = seq(8, 12, length.out = 60),
dur_hour = 0.05*1:60
)
df$end_hour = df$start_hour + df$dur_hour
df$dur_hour = NULL
这篇关于根据R中的开始/结束时间绘制出现频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!