R:计算指定时间范围内不同类别的数量 [英] R: calculate number of distinct categories in the specified time frame

查看:164
本文介绍了R:计算指定时间范围内不同类别的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里有一些虚拟数据:

  user_id       date category
       27 2016-01-01    apple
       27 2016-01-03    apple
       27 2016-01-05     pear
       27 2016-01-07     plum
       27 2016-01-10    apple
       27 2016-01-14     pear
       27 2016-01-16     plum
       11 2016-01-01    apple
       11 2016-01-03     pear
       11 2016-01-05     pear
       11 2016-01-07     pear
       11 2016-01-10    apple
       11 2016-01-14    apple
       11 2016-01-16    apple

我想计算每个 user_id $ c> categories

I'd like to calculate for each user_id the number of distinct categories in the specified time period (e.g. in the past 7, 14 days), including the current order

$

 user_id       date category distinct_7 distinct_14
      27 2016-01-01    apple          1           1
      27 2016-01-03    apple          1           1
      27 2016-01-05     pear          2           2
      27 2016-01-07     plum          3           3
      27 2016-01-10    apple          3           3
      27 2016-01-14     pear          3           3
      27 2016-01-16     plum          3           3
      11 2016-01-01    apple          1           1
      11 2016-01-03     pear          2           2
      11 2016-01-05     pear          2           2
      11 2016-01-07     pear          2           2
      11 2016-01-10    apple          2           2
      11 2016-01-14    apple          2           2
      11 2016-01-16    apple          1           2

问题此处此处,但它没有提到计算指定时间段的累积唯一值。非常感谢您的帮助!

I posted similar questions here or here, however none of it referred to counting cumulative unique values for the specified time period. Thanks a lot for your help!

推荐答案

在tidyverse中,您可以使用 map_int 遍历一组值,并简化为 sapply vapply 的整数。通过比较来计算对象子集的 n_distinct (如 length(unique(...)) 之间的助手,从该日减去适当的金额设置最小值,并设置

In the tidyverse, you can use map_int to iterate over a set of values and simplify to an integer à la sapply or vapply. Count distinct occurrences with n_distinct (like length(unique(...))) of an object subset by comparisons or the helper between, with a minimum set by the appropriate amount subtracted from that day, and you're set.

library(tidyverse)

df %>% group_by(user_id) %>% 
    mutate(distinct_7  = map_int(date, ~n_distinct(category[between(date, .x - 7, .x)])), 
           distinct_14 = map_int(date, ~n_distinct(category[between(date, .x - 14, .x)])))

## Source: local data frame [14 x 5]
## Groups: user_id [2]
## 
##    user_id       date category distinct_7 distinct_14
##      <int>     <date>   <fctr>      <int>       <int>
## 1       27 2016-01-01    apple          1           1
## 2       27 2016-01-03    apple          1           1
## 3       27 2016-01-05     pear          2           2
## 4       27 2016-01-07     plum          3           3
## 5       27 2016-01-10    apple          3           3
## 6       27 2016-01-14     pear          3           3
## 7       27 2016-01-16     plum          3           3
## 8       11 2016-01-01    apple          1           1
## 9       11 2016-01-03     pear          2           2
## 10      11 2016-01-05     pear          2           2
## 11      11 2016-01-07     pear          2           2
## 12      11 2016-01-10    apple          2           2
## 13      11 2016-01-14    apple          2           2
## 14      11 2016-01-16    apple          1           2

这篇关于R:计算指定时间范围内不同类别的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆