值出现的累积序列 [英] Cumulative sequence of occurrences of values

查看:50
本文介绍了值出现的累积序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据集,其中一列可以有四个不同的值:

I have a dataset that looks something like this, with a column that can have four different values:

dataset <- data.frame(out = c("a","b","c","a","d","b","c","a","d","b","c","a"))

在 R 中,我想创建第二列,按顺序记录包含特定值的累积行数.因此输出列将如下所示:

In R, I'd like to create a second column that tallies, in sequence, the cumulative number of rows containing a particular value. Thus the output column would look like this:

out
1
1
1
2
1
2
2
3
2
3
3
4

推荐答案

试试这个:

dataset <- data.frame(out = c("a","b","c","a","d","b","c","a","d","b","c","a"))
with(dataset, ave(as.character(out), out, FUN = seq_along))
# [1] "1" "1" "1" "2" "1" "2" "2" "3" "2" "3" "3" "4"

当然,您可以使用类似 out$asNumbers <- with(dataset, ave(as.character(out), out, FUN = seq_along))

Of course, you can assign the output to a column in your data.frame using something like out$asNumbers <- with(dataset, ave(as.character(out), out, FUN = seq_along))

dplyr"方法也很不错.逻辑与data.table"方法非常相似.一个优点是您不需要用 as.numeric 包装输出,而上面提到的 ave 方法需要使用 as.numeric.

The "dplyr" approach is also quite nice. The logic is very similar to the "data.table" approach. An advantage is that you don't need to wrap the output with as.numeric which would be required with the ave approach mentioned above.

dataset %>% group_by(out) %>% mutate(count = sequence(n()))
# Source: local data frame [12 x 2]
# Groups: out
# 
#    out count
# 1    a     1
# 2    b     1
# 3    c     1
# 4    a     2
# 5    d     1
# 6    b     2
# 7    c     2
# 8    a     3
# 9    d     2
# 10   b     3
# 11   c     3
# 12   a     4

<小时>

第三个选项是使用我的splitstackshape"包中的 getanID.对于此特定示例,您只需要指定 data.frame 名称(因为它是单个列),但是,通常,您会更具体并提及当前用作的列ids",该函数将检查它们是否唯一或是否需要累积序列才能使它们唯一.


A third option is to use getanID from my "splitstackshape" package. For this particular example, you just need to specify the data.frame name (since it's a single column), however, generally, you would be more specific and mention the column(s) that presently serve as "ids", and the function would check whether they are unique or whether a cumulative sequence is required to make them unique.

library(splitstackshape)
# getanID(dataset, "out")  ## Example of being specific about column to use
getanID(dataset)
#     out .id
#  1:   a   1
#  2:   b   1
#  3:   c   1
#  4:   a   2
#  5:   d   1
#  6:   b   2
#  7:   c   2
#  8:   a   3
#  9:   d   2
# 10:   b   3
# 11:   c   3
# 12:   a   4

这篇关于值出现的累积序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆