R中每年的数据摘要 [英] Summary of data for each year in R
问题描述
我有两列数据。一栏中是日期,另一栏中是流量数据。
I have a data with two columns. In one column it is date and in another column it is flow data.
我能够将数据读取为日期和流量数据。我使用了以下代码:
I was able to read the data as date and flow data. I used the following code:
creek <- read.csv("creek.csv")
library(ggplot2)
creek[1:10,]
colnames(creek) <- c("date","flow")
creek$date <- as.Date(creek$date, "%m/%d/%Y")
指向我的数据的链接为 https://www.dropbox.com/s/eqpena3nk82x67e/creek.csv
The link to my data is https://www.dropbox.com/s/eqpena3nk82x67e/creek.csv
现在,我想查找每年的摘要。我想特别了解均值,中位数,最大值等。
Now, I want to find the summary of each year. I want to especially know mean, median, maximum etc.
谢谢。
问候,
Jdbaba
Regards, Jdbaba
推荐答案
Base R
这是从R开始的两种方法。
Base R
Here are two methods from base R.
第一种使用 cut
,分割
和 lapply
以及摘要
。
The first uses cut
, split
and lapply
along with summary
.
creekFlowSummary <- lapply(split(creek, cut(creek$date, "1 year")),
function(x) summary(x[2]))
这将创建列表
。您可以通过访问相应的列表索引或名称来查看不同年份的摘要。
This creates a list
. You can view the summaries of different years by accessing the corresponding list index or name.
creekFlowSummary[1]
# $`1999-01-01`
# flow
# Min. :0.3187
# 1st Qu.:0.3965
# Median :0.4769
# Mean :0.6366
# 3rd Qu.:0.5885
# Max. :7.2560
#
creekFlowSummary["2000-01-01"]
# $`2000-01-01`
# flow
# Min. :0.1370
# 1st Qu.:0.1675
# Median :0.2081
# Mean :0.2819
# 3rd Qu.:0.2837
# Max. :2.3800
第二个使用汇总
:
aggregate(flow ~ cut(date, "1 year"), creek, summary)
# cut(date, "1 year") flow.Min. flow.1st Qu. flow.Median flow.Mean flow.3rd Qu. flow.Max.
# 1 1999-01-01 0.3187 0.3965 0.4770 0.6366 0.5885 7.2560
# 2 2000-01-01 0.1370 0.1675 0.2081 0.2819 0.2837 2.3800
# 3 2001-01-01 0.1769 0.2062 0.2226 0.2950 0.2574 2.9220
# 4 2002-01-01 0.1279 0.1781 0.2119 0.5346 0.4966 14.3900
# 5 2003-01-01 0.3492 0.4761 0.7173 1.0350 1.0840 10.1500
# 6 2004-01-01 0.4178 0.5379 0.6524 0.9691 0.9020 11.7100
# 7 2005-01-01 0.4722 0.6094 0.7279 1.2340 1.0900 17.7200
# 8 2006-01-01 0.2651 0.3275 0.4282 0.5459 0.5758 3.3510
# 9 2007-01-01 0.2784 0.3557 0.4041 0.6331 0.6125 9.6290
# 10 2008-01-01 0.4131 0.5430 0.6477 0.8792 0.9540 4.5960
# 11 2009-01-01 0.3877 0.4572 0.5945 0.8465 0.8309 6.3830
请注意聚合
解决方案:摘要信息的全部是一个矩阵。在输出上查看 str
以了解我的意思。
Be careful with the aggregate
solution though: All of the summary information is a single matrix. View str
on the output to see what I mean.
当然,还有其他方法可以执行此操作。一种方法是使用 xts
包。
There are, of course other ways to do this. One way is to use the xts
package.
首先,将数据转换为 xts
:
library(xts)
creekx <- xts(creek$flow, order.by=creek$date)
然后,使用 apply.yearly
以及您感兴趣的任何功能。
Then, use apply.yearly
and whatever functions you are interested in.
这里是每年的平均值:
apply.yearly(creekx, mean)
# [,1]
# 1999-12-31 0.6365604
# 2000-12-31 0.2819057
# 2001-12-31 0.2950348
# 2002-12-31 0.5345666
# 2003-12-31 1.0351742
# 2004-12-31 0.9691180
# 2005-12-31 1.2338066
# 2006-12-31 0.5458652
# 2007-12-31 0.6331271
# 2008-12-31 0.8792396
# 2009-09-30 0.8465300
每年的最大值:
apply.yearly(creekx, max)
# [,1]
# 1999-12-31 7.256
# 2000-12-31 2.380
# 2001-12-31 2.922
# 2002-12-31 14.390
# 2003-12-31 10.150
# 2004-12-31 11.710
# 2005-12-31 17.720
# 2006-12-31 3.351
# 2007-12-31 9.629
# 2008-12-31 4.596
# 2009-09-30 6.383
或者将它们像这样放在一起: apply.yearly(creekx,function(x)cbind(mean(x),sum(x),max(x)))
Or, put them together like this: apply.yearly(creekx, function(x) cbind(mean(x), sum(x), max(x)))
data.table
包也可能对您很感兴趣,特别是在您处理大量数据的情况下。这是 data.table
的方法。关键是在读取以下数据时,请在日期列上使用 as.IDate
:
The data.table
package may also be of interest for you, particularly if you are dealing with a lot of data. Here's a data.table
approach. The key is to use as.IDate
on your "date" column while you are reading your data in:
library(data.table)
DT <- data.table(date = as.IDate(creek$date), creek[-1])
DT[, list(mean = mean(flow),
tot = sum(flow),
max = max(flow)),
by = year(date)]
# year mean tot max
# 1: 1999 0.6365604 104.3959 7.256
# 2: 2000 0.2819057 103.1775 2.380
# 3: 2001 0.2950348 107.6877 2.922
# 4: 2002 0.5345666 195.1168 14.390
# 5: 2003 1.0351742 377.8386 10.150
# 6: 2004 0.9691180 354.6972 11.710
# 7: 2005 1.2338066 450.3394 17.720
# 8: 2006 0.5458652 199.2408 3.351
# 9: 2007 0.6331271 231.0914 9.629
# 10: 2008 0.8792396 321.8017 4.596
# 11: 2009 0.8465300 231.1027 6.383
这篇关于R中每年的数据摘要的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!