R：计算指定时间范围内不同类别的数量 [英] R: calculate number of distinct categories in the specified time frame

查看：164 发布时间：2017/3/12 13:04:45 r data.table dplyr distinct-values

本文介绍了R：计算指定时间范围内不同类别的数量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这里有一些虚拟数据：

  user_id       date category
       27 2016-01-01    apple
       27 2016-01-03    apple
       27 2016-01-05     pear
       27 2016-01-07     plum
       27 2016-01-10    apple
       27 2016-01-14     pear
       27 2016-01-16     plum
       11 2016-01-01    apple
       11 2016-01-03     pear
       11 2016-01-05     pear
       11 2016-01-07     pear
       11 2016-01-10    apple
       11 2016-01-14    apple
       11 2016-01-16    apple

我想计算每个 user_id $ c> categories

I'd like to calculate for each user_id the number of distinct categories in the specified time period (e.g. in the past 7, 14 days), including the current order

$

user_id date category distinct_7 distinct_14 27 2016-01-01 apple 1 1 27 2016-01-03 apple 1 1 27 2016-01-05 pear 2 2 27 2016-01-07 plum 3 3 27 2016-01-10 apple 3 3 27 2016-01-14 pear 3 3 27 2016-01-16 plum 3 3 11 2016-01-01 apple 1 1 11 2016-01-03 pear 2 2 11 2016-01-05 pear 2 2 11 2016-01-07 pear 2 2 11 2016-01-10 apple 2 2 11 2016-01-14 apple 2 2 11 2016-01-16 apple 1 2

问题此处或此处，但它没有提到计算指定时间段的累积唯一值。非常感谢您的帮助！

I posted similar questions here or here, however none of it referred to counting cumulative unique values for the specified time period. Thanks a lot for your help!

推荐答案

在tidyverse中，您可以使用 map_int 遍历一组值，并简化为 sapply 或 vapply 的整数。通过比较来计算对象子集的 n_distinct （如 length（unique（...）） 之间的助手，从该日减去适当的金额设置最小值，并设置
In the tidyverse, you can use map_int to iterate over a set of values and simplify to an integer à la sapply or vapply. Count distinct occurrences with n_distinct (like length(unique(...))) of an object subset by comparisons or the helper between, with a minimum set by the appropriate amount subtracted from that day, and you're set. library(tidyverse) df %>% group_by(user_id) %>% mutate(distinct_7 = map_int(date, ~n_distinct(category[between(date, .x - 7, .x)])), distinct_14 = map_int(date, ~n_distinct(category[between(date, .x - 14, .x)]))) ## Source: local data frame [14 x 5] ## Groups: user_id [2] ## ## user_id date category distinct_7 distinct_14 ## <int> <date> <fctr> <int> <int> ## 1 27 2016-01-01 apple 1 1 ## 2 27 2016-01-03 apple 1 1 ## 3 27 2016-01-05 pear 2 2 ## 4 27 2016-01-07 plum 3 3 ## 5 27 2016-01-10 apple 3 3 ## 6 27 2016-01-14 pear 3 3 ## 7 27 2016-01-16 plum 3 3 ## 8 11 2016-01-01 apple 1 1 ## 9 11 2016-01-03 pear 2 2 ## 10 11 2016-01-05 pear 2 2 ## 11 11 2016-01-07 pear 2 2 ## 12 11 2016-01-10 apple 2 2 ## 13 11 2016-01-14 apple 2 2 ## 14 11 2016-01-16 apple 1 2 这篇关于R：计算指定时间范围内不同类别的数量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R：计算指定时间范围内不同类别的数量 [英] R: calculate number of distinct categories in the specified time frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：计算指定时间范围内不同类别的数量 [英] R: calculate number of distinct categories in the specified time frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭