按时间段汇总润滑 [英] Aggregation by time period in lubridate

查看:83
本文介绍了按时间段汇总润滑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题询问有关R中按时间段的汇总,这就是熊猫所说的重采样.最有用的答案是使用XTS包按给定的时间段进行分组,并应用诸如sum()或mean()之类的功能.

This question asks about aggregation by time period in R, what pandas calls resampling. The most useful answer uses the XTS package to group by a given time period, applying some function such as sum() or mean().

其中一项评论表明,lubridate中有类似的东西,但没有详细说明.有人可以使用lubridate提供惯用的例子吗?我已经阅读了几次润滑润滑脂小插图,可以想象润滑润滑脂和plyr的某种组合,但是我想确保没有更简单的方法可以错过.

One of the comments suggested there was something similar in lubridate, but didn't elaborate. Can someone provide an idiomatic example using lubridate? I've read through the lubridate vignette a couple times and can imagine some combination of lubridate and plyr, however I want to make sure there isn't an easier way that I'm missing.

为了使示例更真实,假设我要从该数据集中向北行驶的自行车的每日总和:

To make the example more real, let's say I want the daily sum of bicycles traveling northbound from this dataset:

library(lubridate)
library(reshape2)

bikecounts <- read.csv(url("http://data.seattle.gov/api/views/65db-xm6k/rows.csv?accessType=DOWNLOAD"), header=TRUE, stringsAsFactors=FALSE)
names(bikecounts) <- c("Date", "Northbound", "Southbound")

数据如下:

> head(bikecounts)
                    Date Northbound Southbound
1 10/02/2012 12:00:00 AM          0          0
2 10/02/2012 01:00:00 AM          0          0
3 10/02/2012 02:00:00 AM          0          0
4 10/02/2012 03:00:00 AM          0          0
5 10/02/2012 04:00:00 AM          0          0
6 10/02/2012 05:00:00 AM          0          0

推荐答案

我不知道为什么要为此使用lubridate.如果您只是想找一些比xts更出色的东西,可以尝试一下

I don't know why you'd use lubridate for this. If you're just looking for something less awesome than xts you could try this

tapply(bikecounts$Northbound, as.Date(bikecounts$Date, format="%m/%d/%Y"), sum)

基本上,您只需要按日期split,然后应用一个函数即可.

Basically, you just need to split by Date, then apply a function.

lubridate可用于为拆分应用问题创建分组因子.因此,例如,如果您想要每个月的总和(忽略年份)

lubridate could be used for creating a grouping factor for split-apply problems. So, for example, if you want the sum for each month (ignoring year)

tapply(bikecounts$Northbound, month(mdy_hms(bikecounts$Date)), sum)

但是,它只是对基本R函数使用包装器,对于OP,我认为基本R函数as.Date是最简单的(事实证明,其他答案也忽略了您的使用要求lubridate ;-)).

But, it's just using wrappers for base R functions, and in the case of the OP, I think the base R function as.Date is the easiest (as evidenced by the fact that the other Answers also ignored your request to use lubridate ;-) ).

答案未涵盖的其他内容将为您提供每月最后一行的行号. split.xts利用它来拆分xts对象-split(x, "months")将返回xts对象的列表,其中每个组件在不同月份使用.

Something that wasn't covered by the Answer to the other Question linked to in the OP is split.xts. period.apply splits an xts at endpoints and applies a function to each group. You can find endpoints that are useful for a given task with the endpoints function. For example, if you have an xts object, x, then endpoints(x, "months") would give you the row numbers that are the last row of each month. split.xts leverages that to split an xts object -- split(x, "months") would return a list of xts objects where each component was for a different month.

尽管split.xts()endpoints()主要用于xts对象,但它们也可以在其他一些对象上工作,包括基于普通时间的矢量.即使您不想使用xts对象,由于endpoints()的方便性或速度(用C语言实现),您仍然可以找到endpoints()的用途

Although, split.xts() and endpoints() are primarily intended for xts objects, they also work on some other objects as well, including plain time based vectors. Even if you don't want to use xts objects, you still may find uses for endpoints() because of its convenience or its speed (implemented in C)

> split.xts(as.Date("1970-01-01") + 1:10, "weeks")
[[1]]
[1] "1970-01-02" "1970-01-03" "1970-01-04"

[[2]]
[1] "1970-01-05" "1970-01-06" "1970-01-07" "1970-01-08" "1970-01-09"
[6] "1970-01-10" "1970-01-11"

> endpoints(as.Date("1970-01-01") + 1:10, "weeks")
[1]  0  3 10

我认为lubridate在此问题中的最佳用途是将"Date"字符串解析为POSIXct对象.即mdy_hms函数.

I think lubridate's best use in this problem is for parsing the "Date" strings into POSIXct objects. i.e. the mdy_hms function in this case.

这是一个xts解决方案,它使用lubridate来解析日期"字符串.

Here's an xts solution that uses lubridate to parse the "Date" strings.

x <- xts(bikecounts[, -1], mdy_hms(bikecounts$Date))
period.apply(x, endpoints(x, "days"), sum)
apply.daily(x, sum) # identical to above

对于此特定任务,xts还具有非常快的优化的period.sum函数(用Fortran编写)

For this specific task, xts also has an optimized period.sum function (written in Fortran) that is very fast

period.sum(x, endpoints(x, "days"))

这篇关于按时间段汇总润滑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆