R:计算指定时间范围内不同类别的数量 [英] R: calculate number of distinct categories in the specified time frame
问题描述
这里有一些虚拟数据:
user_id date category
27 2016-01-01 apple
27 2016-01-03 apple
27 2016-01-05 pear
27 2016-01-07 plum
27 2016-01-10 apple
27 2016-01-14 pear
27 2016-01-16 plum
11 2016-01-01 apple
11 2016-01-03 pear
11 2016-01-05 pear
11 2016-01-07 pear
11 2016-01-10 apple
11 2016-01-14 apple
11 2016-01-16 apple
我想计算每个 user_id
$ c> categories
I'd like to calculate for each user_id
the number of distinct categories
in the specified time period (e.g. in the past 7, 14 days), including the current order
$
user_id date category distinct_7 distinct_14
27 2016-01-01 apple 1 1
27 2016-01-03 apple 1 1
27 2016-01-05 pear 2 2
27 2016-01-07 plum 3 3
27 2016-01-10 apple 3 3
27 2016-01-14 pear 3 3
27 2016-01-16 plum 3 3
11 2016-01-01 apple 1 1
11 2016-01-03 pear 2 2
11 2016-01-05 pear 2 2
11 2016-01-07 pear 2 2
11 2016-01-10 apple 2 2
11 2016-01-14 apple 2 2
11 2016-01-16 apple 1 2
问题此处或此处,但它没有提到计算指定时间段的累积唯一值。非常感谢您的帮助!
I posted similar questions here or here, however none of it referred to counting cumulative unique values for the specified time period. Thanks a lot for your help!
推荐答案
在tidyverse中,您可以使用 map_int
遍历一组值,并简化为 sapply
或 vapply
的整数。通过比较来计算对象子集的 n_distinct
(如 length(unique(...))
之间的助手
,从该日减去适当的金额设置最小值,并设置
In the tidyverse, you can use map_int
to iterate over a set of values and simplify to an integer à la sapply
or vapply
. Count distinct occurrences with n_distinct
(like length(unique(...))
) of an object subset by comparisons or the helper between
, with a minimum set by the appropriate amount subtracted from that day, and you're set.
library(tidyverse)
df %>% group_by(user_id) %>%
mutate(distinct_7 = map_int(date, ~n_distinct(category[between(date, .x - 7, .x)])),
distinct_14 = map_int(date, ~n_distinct(category[between(date, .x - 14, .x)])))
## Source: local data frame [14 x 5]
## Groups: user_id [2]
##
## user_id date category distinct_7 distinct_14
## <int> <date> <fctr> <int> <int>
## 1 27 2016-01-01 apple 1 1
## 2 27 2016-01-03 apple 1 1
## 3 27 2016-01-05 pear 2 2
## 4 27 2016-01-07 plum 3 3
## 5 27 2016-01-10 apple 3 3
## 6 27 2016-01-14 pear 3 3
## 7 27 2016-01-16 plum 3 3
## 8 11 2016-01-01 apple 1 1
## 9 11 2016-01-03 pear 2 2
## 10 11 2016-01-05 pear 2 2
## 11 11 2016-01-07 pear 2 2
## 12 11 2016-01-10 apple 2 2
## 13 11 2016-01-14 apple 2 2
## 14 11 2016-01-16 apple 1 2
这篇关于R:计算指定时间范围内不同类别的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!