R:计算组中观察值的数量 [英] R: Count Number of Observations within a group
问题描述
Using the R programming language, I am trying to follow this tutorial over here: Count number of observations per day, month and year in R
我每天创建一次数据,然后每周对这些数据进行汇总.到"y.week"文件,我想添加一个计数"列,列出了每周的观察次数.
I create data at daily intervals and then took weekly sums of this data. To the "y.week" file, I want to add a "count" column that lists the number of observations in each week.
这是我正在使用的以下代码:
Here is the code below I am using:
#load libraries
library(xts)
library(ggplot2)
#create data
date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")
date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")
property_damages_in_dollars <- rnorm(731,100,10)
final_data <- data.frame(date_decision_made, property_damages_in_dollars)
#aggregate and count by week
y.week <-aggregate(property_damages_in_dollars~format(as.Date(date_decision_made),
format="%W-%y"),data=final_data, FUN=sum)
counts_week <- data.frame(table(as.Date(index(y.week))))
y.week$count = count_week
但是我认为这是不正确的.
But I don't think this is correct.
然后我尝试每月做一次相同的事情:
I then tried to do the same thing per month:
#aggregate and count by month
y.mon<-aggregate(property_damages_in_dollars~format(as.Date(date_decision_made),
format="%Y/%m"),data=final_data, FUN=sum)
counts_mon <- data.frame(table(as.Date(index(y.mon))))
y.mon$count = count_mon
通常,我会使用"dplyr"图书馆按组进行计数(按月计数,按周计数),但是我不确定如何讲"书目.dplyr会将同一周(或同一月)中的观察结果视为组".
Normally, I would have used the "dplyr" library to count by group (count by month, count by week), but I am not sure how to "tell" dplyr to consider observations in the same week (or in the same month) as a "group".
有人可以告诉我我在做什么错吗?
Can someone please tell me what I am doing wrong?
谢谢
可能的答案(由Ronak Shah提供):
Possible answer (provided by Ronak Shah) :
每周:
date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")
date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")
property_damages_in_dollars <- rnorm(731,100,10)
final_data <- data.frame(date_decision_made, property_damages_in_dollars)
final_data %>%
mutate(date_decision_made = as.Date(date_decision_made)) %>%
group_by(week = format(date_decision_made, "%W-%y")) %>%
summarise( total = sum(property_damages_in_dollars, na.rm = TRUE), Count = n())
每月:
date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day")
date_decision_made <- format(as.Date(date_decision_made), "%Y/%m/%d")
property_damages_in_dollars <- rnorm(731,100,10)
final_data <- data.frame(date_decision_made, property_damages_in_dollars)
final_data %>%
mutate(date_decision_made = as.Date(date_decision_made)) %>%
group_by(week = format(date_decision_made, "%Y-%m")) %>%
summarise( total = sum(property_damages_in_dollars, na.rm = TRUE), Count = n())
推荐答案
如果将对象保持其自然形式会更好.例如,将日期保留为日期而不是字符串.然后,您可以使用
It would be better if you keep objects in their natural form. For example, keeping dates as dates instead of string. You can then use
library(dplyr)
final_data %>%
mutate(date_decision_made = as.Date(date_decision_made)) %>%
add_count(week = format(date_decision_made, "%W-%y"), name = 'Count')
使用 add_count
是将 group_by
+ mutate
与 n()
结合使用的快捷方式:
Using add_count
is a shortcut over using group_by
+ mutate
with n()
:
final_data %>%
mutate(date_decision_made = as.Date(date_decision_made)) %>%
group_by(week = format(date_decision_made, "%W-%y")) %>%
mutate(Count = n())
这篇关于R:计算组中观察值的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!