R中的值的累积计数 [英] Cumulative count of values in R

查看:117
本文介绍了R中的值的累积计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望你做得很好.我想知道如何在一定条件下计算数据集的累计和.我的数据集的简化版本如下所示:

I hope you are doing very well. I would like to know how to calculate the cumulative sum of a data set with certain conditions. A simplified version of my data set would look like:


t   id  
A   22
A   22
R   22
A   41
A   98
A   98
A   98
R   98
A   46
A   46
R   46
A   46
A   46
A   46
R   46
A   46
A   12
R   54
A   66
R   13 
A   13
A   13
A   13
A   13
R   13
A   13

想建立一个新的数据集,对于每个"id"值,我将具有每个id出现的累积次数,但是当t = R时,我需要重新开始计数,例如

Would like to make a new data set where, for each value of "id", I would have the cumulative number of times that each id appears , but when t=R I need to restart the counting e.g.


t   id  count
A   22  1
A   22  2
R   22  0
A   41  1
A   98  1
A   98  2
A   98  3
R   98  0
A   46  1
A   46  2
R   46  0
A   46  1
A   46  2
A   46  3
R   46  0
A   46  1
A   12  1
R   54  0
A   66  1
R   13  0
A   13  1
A   13  2
A   13  3
A   13  4
R   13  0
A   13  1

关于如何执行此操作的任何想法?预先感谢.

Any ideas as to how to do this? Thanks in advance.

推荐答案

使用rle:

out <- transform(df, count = sequence(rle(do.call(paste, df))$lengths))
out$count[out$t == "R"] <- 0

如果data.frame具有多于这两列,并且您只想检查这两列,则只需将df替换为df[, 1:2](或)df[, c("t", "id")].

If your data.frame has more than these two columns, and you want to check only these two columns, then, just replace df with df[, 1:2] (or) df[, c("t", "id")].

如果您发现do.call(paste, df)危险(如@flodel注释),则可以将其替换为:

If you find do.call(paste, df) dangerous (as @flodel comments), then you can replace that with:

as.character(interaction(df))

我个人认为使用此设置不会发现任何危险或笨拙的情况(只要您使用正确的分隔符即可,这意味着您非常了解数据).但是,如果您确实找到了它,第二种解决方案可能会为您提供帮助.

I personally don't find anything dangerous or clumsy with this setup (as long as you have the right separator, meaning you know your data well). However, if you do find it as such, the second solution may help you.

对于那些不喜欢使用do.call(paste, df)as.character(interaction(df))的人(请参阅我@flodel和@HongOoi之间的评论交流),这是另一个基本解决方案:

For those who don't like using do.call(paste, df) or as.character(interaction(df)) (please see the comment exchanges between me, @flodel and @HongOoi), here's another base solution:

idx <- which(df$t == "R")
ww <- NULL
if (length(idx) > 0) {
    ww <- c(min(idx), diff(idx), nrow(df)-max(idx))
    df <- transform(df, count = ave(id, rep(seq_along(ww), ww), 
                   FUN=function(y) sequence(rle(y)$lengths)))
    df$count[idx] <- 0
} else {
    df$count <- seq_len(nrow(df))
}

这篇关于R中的值的累积计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆