计算R中的简单保留 [英] calculating simple retention in R

查看:89
本文介绍了计算R中的简单保留的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于数据集 test ,我的目标是找出每个周期从一个周期到下一个周期结转了多少不重复用户。

For the dataset test, my objective is to find out how many unique users carried over from one period to the next on a period-by-period basis.

> test
   user_id period
1        1      1
2        5      1
3        1      1
4        3      1
5        4      1
6        2      2
7        3      2
8        2      2
9        3      2
10       1      2
11       5      3
12       5      3
13       2      3
14       1      3
15       4      3
16       5      4
17       5      4
18       5      4
19       4      4
20       3      4

例如,在第一时期有四个唯一用户(1、3、4和5),其中两个其中第二阶段活跃。因此,保留率将为0.5。在第二阶段中,有三个唯一用户,其中三个在第三阶段中处于活动状态,因此保留率将为0.666,依此类推。如何找到下一个时期活跃的唯一身份用户的百分比?任何建议将不胜感激。

For example, in the first period there were four unique users (1, 3, 4, and 5), two of which were active in the second period. Therefore the retention rate would be 0.5. In the second period there were three unique users, two of which were active in the third period, and so the retention rate would be 0.666, and so on. How would one find the percentage of unique users that are active in the following period? Any suggestions would be appreciated.

输出如下:

> output
  period retention
1      1        NA
2      2     0.500
3      3     0.666
4      4     0.500

检验数据:

> dput(test)
structure(list(user_id = c(1, 5, 1, 3, 4, 2, 3, 2, 3, 1, 5, 5, 
2, 1, 4, 5, 5, 5, 4, 3), period = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 
2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)), .Names = c("user_id", "period"
), row.names = c(NA, -20L), class = "data.frame")


推荐答案

这不是很优雅,但似乎可行。假设 df 是数据框:

This isn't so elegant but it seems to work. Assuming df is the data frame:

# make a list to hold unique IDS by 
uniques = list()
for(i in 1:max(df$period)){
  uniques[[i]] = unique(df$user_id[df$period == i])
}

# hold the retention rates
retentions = rep(NA, times = max(df$period))

for(j in 2:max(df$period)){
  retentions[j] = mean(uniques[[j-1]] %in% uniques[[j]])
}

基本上,%in%用于创建第一个参数的每个元素是否在第二个参数中的逻辑。取平均值可以得出比例。

Basically the %in% creates a logical of whether or not each element of the first argument is in the second. Taking a mean gives us the proportion.

这篇关于计算R中的简单保留的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆