R，dplyr：n_distinct的累积版本 [英] R, dplyr: cumulative version of n_distinct

查看：94 发布时间：2020/10/26 3:42:20 r dplyr cumsum

本文介绍了R，dplyr：n_distinct的累积版本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个如下数据框。它按时间列进行排序。

I have a dataframe as follows. It is ordered by column time.

输入-

df = data.frame(time = 1:20,
            grp = sort(rep(1:5,4)),
            var1 = rep(c('A','B'),10)
            )

head(df,10)
   time grp var1
1   1   1    A
2   2   1    B
3   3   1    A
4   4   1    B
5   5   2    A
6   6   2    B
7   7   2    A
8   8   2    B
9   9   3    A
10 10   3    B

我想创建另一个变量 var2 到目前为止的不同 var1 值，即直到每个时间 的点为止c $ c> grp 。这与我使用 n_distinct 会得到的有点不同。

I want to create another variable var2 which computes no of distinct var1 values so far i.e. until that point in time for each group grp . This is a little different from what I'd get if I were to use n_distinct.

预期的输出-

   time grp var1 var2
1   1   1    A    1
2   2   1    B    2
3   3   1    A    2
4   4   1    B    2
5   5   2    A    1
6   6   2    B    2
7   7   2    A    2
8   8   2    B    2
9   9   3    A    1
10 10   3    B    2

我要创建为此，说一个 cum_n_distinct 并将其用作-

I want to create a function say cum_n_distinct for this and use it as -

d_out = df %>%
  arrange(time) %>%
  group_by(grp) %>%
  mutate(var2 = cum_n_distinct(var1))

推荐答案

假定物料在时间已经，首先定义一个累积的不同函数：


Assuming stuff is ordered by time already, first define a cumulative distinct function:
dist_cum <- function(var)
  sapply(seq_along(var), function(x) length(unique(head(var, x))))

然后使用 ave 创建组的基本解决方案（请注意，假定 var1 是因素），然后将函数应用于每个组：
Then a base solution that uses ave to create groups (note, assumes var1 is factor), and then applies our function to each group:
transform(df, var2=ave(as.integer(var1), grp, FUN=dist_cum))

 A  data.table 解决方案，基本上会做同样的事情：
A data.table solution, basically doing the same thing:
library(data.table)
(data.table(df)[, var2:=dist_cum(var1), by=grp])

和 dplyr ，再次是同一件事：
library(dplyr)
df %>% group_by(grp) %>% mutate(var2=dist_cum(var1))


                        这篇关于R，dplyr：n_distinct的累积版本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

R，dplyr：n_distinct的累积版本 [英] R, dplyr: cumulative version of n_distinct

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R，dplyr：n_distinct的累积版本 [英] R, dplyr: cumulative version of n_distinct

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭