R中唯一值的累积计数 [英] Cumulative count of unique values in R
问题描述
我的数据集的简化版本如下:
A simplified version of my data set would look like:
depth value
1 a
1 b
2 a
2 b
2 b
3 c
我想建立一个新的数据集,对于每个深度"值,我将从顶部开始累积唯一值的数量.例如
I would like to make a new data set where, for each value of "depth", I would have the cumulative number of unique values, starting from the top. e.g.
depth cumsum
1 2
2 2
3 3
关于如何执行此操作的任何想法?我对R比较陌生.
Any ideas as to how to do this? I am relatively new to R.
推荐答案
我发现这是使用 factor
并仔细设置 levels
的完美案例.我将在这里使用 data.table
来实现这个想法.确保您的 value
列为 character
(不是绝对要求).
I find this a perfect case of using factor
and setting levels
carefully. I'll use data.table
here with this idea. Make sure your value
column is character
(not an absolute requirement).
-
第1步:仅获取
唯一
行,即可将您的data.frame
转换为data.table
.
step 1: Get your
data.frame
converted todata.table
by taking justunique
rows.
require(data.table)
dt <- as.data.table(unique(df))
setkey(dt, "depth") # just to be sure before factoring "value"
步骤2:将 value
转换为 factor
,并强制转换为 numeric
.确保自己设置水平(这很重要).
step 2: Convert value
to a factor
and coerce to numeric
. Make sure to set the levels yourself (it is important).
dt[, id := as.numeric(factor(value, levels = unique(value)))]
第3步:将关键列设置为 depth
进行子设置,然后只选择最后一个值
setkey(dt, "depth", "id")
dt.out <- dt[J(unique(depth)), mult="last"][, value := NULL]
# depth id
# 1: 1 2
# 2: 2 2
# 3: 3 3
步骤4:由于深度增加的行中的所有值均应至少具有 前一行的值,因此您应使用 cummax
来获得最终输出.
step 4: Since all values in the rows with increasing depth should have at least the value of the previous row, you should use cummax
to get the final output.
dt.out[, id := cummax(id)]
编辑:以上代码仅用于说明目的.实际上,您根本不需要第三列.这就是我编写最终代码的方式.
The above code was for illustrative purposes. In reality you don't need a 3rd column at all. This is how I'd write the final code.
require(data.table)
dt <- as.data.table(unique(df))
setkey(dt, "depth")
dt[, value := as.numeric(factor(value, levels = unique(value)))]
setkey(dt, "depth", "value")
dt.out <- dt[J(unique(depth)), mult="last"]
dt.out[, value := cummax(value)]
这是一个更棘手的示例,代码的输出:
Here's a more tricky example and the output from the code:
df <- structure(list(depth = c(1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 6),
value = structure(c(1L, 2L, 3L, 4L, 1L, 3L, 4L, 5L, 6L, 1L, 1L),
.Label = c("a", "b", "c", "d", "f", "g"), class = "factor")),
.Names = c("depth", "value"), row.names = c(NA, -11L),
class = "data.frame")
# depth value
# 1: 1 2
# 2: 2 4
# 3: 3 4
# 4: 4 5
# 5: 5 6
# 6: 6 6
这篇关于R中唯一值的累积计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!