用月份中的范围值和合计日期除以列值以计算该月内的范围频率 [英] dividing column values in range and aggregate date by month to count frequency of range which fall in that month
问题描述
我有一个数据框,其中包含一个整数类型的日期列。
我还想将价格除以10,000,然后计算该月的降落频率
I have a data frame that contains a date column that is in integer type. I also want to divide price in range of 10,000 and then count frequency which falls in that month
> df
date values price
11/25/18 a 10000
11/30/18 b 30500
12/4/18 a 20000
12/5/18 b 65000
12/5/18 a 50000
12/6/18 b 35000
12/6/18 c 40000
12/6/18 a 45000
12/6/18 a 30000
12/7/18 b 80000
12/7/18 c 85000
12/7/18 a 90000
12/9/18 b 20000
12/12/18 a 32500
12/12/18 c 40200
12/13/18 b 56000
1/9/19 a 82000
1/9/19 c 63000
1/9/19 b 20000
1/10/19 d 25000
1/10/19 d 34000
1/10/19 d 13020
1/10/19 a 50000
1/11/19 c 24300
1/11/19 d 40000
2/1/19 a 95000
2/10/19 a 20000
2/13/19 b 10000
3/14/19 d 30000
3/17/19 c 45000
5/4/19 d 18000
5/5/19 c 12000
5/6/19 d 90000
5/31/19 a 90000
我正在尝试此代码,但无法在月份中进行汇总
I was trying this code but I am not able to aggregate in month
df %>%
group_by(date) %>%
count(values)
由此,我得到了每天
group_by(month = month(date)) %>%
count(values)
当我尝试使用此代码汇总月份中的日期时,我正在关注错误
When I was trying this code to aggregate date in month then I was getting following error
(as.POSIXlt.character(as.character(x),...)错误:
字符字符串不是标准的明确格式)
(Error in as.POSIXlt.character(as.character(x), ...) : the character string is not in a standard unambiguous format)
然后按10,000(在价格列中)的步骤进行分组代码
And to a group by steps of 10,000 (in the price column) I am using following code
tally(group_by(df, values,
price = cut(price, breaks = seq(10000, 200000, by = 10000)))) %>%
ungroup() %>%
spread(price, n, fill = 0)
问题:
我无法将其与代码结合起来以汇总月份中的日期,然后按价格组传播数据。
Problem:
I am not able to combine this with the code to aggregate date in month and then to spread the data by price groups.
date values 10k-20k 20k-30k 30k-40k 40k-50k 50k-60k 60k-70k 70k-80k 80k-90k
11/18 a 1
11/18 b 1
12/18 a 1 1 1 1 1
12/18 b 1 1 1 1
12/18 c 1 1 1
...
推荐答案
我们可以从日期列中提取月份-年份,使用 cut
来中断将价格
放入不同的存储桶中,计数
频率,然后价差
进行宽幅调整。
We can extract month-year from the date column, use cut
to break price
into different buckets, count
the frequency and then spread
to wide format.
library(dplyr)
cut_group <- seq(10000,200000,by=10000)
df %>%
mutate(date = as.Date(date, "%m/%d/%y"),
month_year = format(date, "%m-%y"),
groups = cut(price, cut_group, include.lowest = TRUE,
labels = paste(cut_group[-length(cut_group)], cut_group[-1], sep = "-"))) %>%
count(values, month_year, groups) %>%
tidyr::spread(groups, n, fill = 0)
# values month_year `10000-20000` `20000-30000` `30000-40000` `40000-50000`
# <fct> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 a 01-19 0 0 0 1
# 2 a 02-19 1 0 0 0
# 3 a 05-19 0 0 0 0
# 4 a 11-18 1 0 0 0
#.....
数据
df <- structure(list(date = structure(c(4L, 5L, 8L, 9L, 9L, 10L, 10L,
10L, 10L, 11L, 11L, 11L, 12L, 6L, 6L, 7L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 2L, 2L, 13L, 14L, 15L, 16L, 17L, 19L, 20L, 21L, 18L), .Label =
c("1/10/19", "1/11/19", "1/9/19", "11/25/18", "11/30/18", "12/12/18", "12/13/18",
"12/4/18", "12/5/18", "12/6/18", "12/7/18", "12/9/18", "2/1/19",
"2/10/19", "2/13/19", "3/14/19", "3/17/19", "5/31/19", "5/4/19",
"5/5/19", "5/6/19"), class = "factor"), values = structure(c(1L,
2L, 1L, 2L, 1L, 2L, 3L, 1L, 1L, 2L, 3L, 1L, 2L, 1L, 3L, 2L, 1L,
3L, 2L, 4L, 4L, 4L, 1L, 3L, 4L, 1L, 1L, 2L, 4L, 3L, 4L, 3L, 4L,
1L), .Label = c("a", "b", "c", "d"), class = "factor"), price = c(10000L,
30500L, 20000L, 65000L, 50000L, 35000L, 40000L, 45000L, 30000L,
80000L, 85000L, 90000L, 20000L, 32500L, 40200L, 56000L, 82000L,
63000L, 20000L, 25000L, 34000L, 13020L, 50000L, 24300L, 40000L,
95000L, 20000L, 10000L, 30000L, 45000L, 18000L, 12000L, 90000L,
90000L)), class = "data.frame", row.names = c(NA, -34L))
这篇关于用月份中的范围值和合计日期除以列值以计算该月内的范围频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!