R中唯一值的累积计数 [英] Cumulative count of unique values in R

查看:28
本文介绍了R中唯一值的累积计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据集的简化版本如下:

A simplified version of my data set would look like:

depth value
   1     a
   1     b
   2     a
   2     b
   2     b
   3     c

我想建立一个新的数据集,对于每个深度"值,我将从顶部开始累积唯一值的数量.例如

I would like to make a new data set where, for each value of "depth", I would have the cumulative number of unique values, starting from the top. e.g.

depth cumsum
 1      2
 2      2
 3      3

关于如何执行此操作的任何想法?我对R比较陌生.

Any ideas as to how to do this? I am relatively new to R.

推荐答案

我发现这是使用 factor 并仔细设置 levels 的完美案例.我将在这里使用 data.table 来实现这个想法.确保您的 value 列为 character (不是绝对要求).

I find this a perfect case of using factor and setting levels carefully. I'll use data.table here with this idea. Make sure your value column is character (not an absolute requirement).

  • 第1步:仅获取唯一行,即可将您的 data.frame 转换为 data.table .

  • step 1: Get your data.frame converted to data.table by taking just unique rows.

require(data.table)
dt <- as.data.table(unique(df))
setkey(dt, "depth") # just to be sure before factoring "value"

  • 步骤2:将 value 转换为 factor ,并强制转换为 numeric .确保自己设置水平(这很重要).

  • step 2: Convert value to a factor and coerce to numeric. Make sure to set the levels yourself (it is important).

    dt[, id := as.numeric(factor(value, levels = unique(value)))]
    

  • 第3步:将关键列设置为 depth 进行子设置,然后只选择最后一个值

     setkey(dt, "depth", "id")
     dt.out <- dt[J(unique(depth)), mult="last"][, value := NULL]
    
    #    depth id
    # 1:     1  2
    # 2:     2  2
    # 3:     3  3
    

  • 步骤4:由于深度增加的行中的所有值均应至少具有 前一行的值,因此您应使用 cummax 来获得最终输出.

  • step 4: Since all values in the rows with increasing depth should have at least the value of the previous row, you should use cummax to get the final output.

    dt.out[, id := cummax(id)]
    

  • 编辑:以上代码仅用于说明目的.实际上,您根本不需要第三列.这就是我编写最终代码的方式.

    The above code was for illustrative purposes. In reality you don't need a 3rd column at all. This is how I'd write the final code.

    require(data.table)
    dt <- as.data.table(unique(df))
    setkey(dt, "depth")
    dt[, value := as.numeric(factor(value, levels = unique(value)))]
    setkey(dt, "depth", "value")
    dt.out <- dt[J(unique(depth)), mult="last"]
    dt.out[, value := cummax(value)]
    

    这是一个更棘手的示例,代码的输出:

    Here's a more tricky example and the output from the code:

    df <- structure(list(depth = c(1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 6), 
                    value = structure(c(1L, 2L, 3L, 4L, 1L, 3L, 4L, 5L, 6L, 1L, 1L), 
                    .Label = c("a", "b", "c", "d", "f", "g"), class = "factor")), 
                    .Names = c("depth", "value"), row.names = c(NA, -11L), 
                    class = "data.frame")
    #    depth value
    # 1:     1     2
    # 2:     2     4
    # 3:     3     4
    # 4:     4     5
    # 5:     5     6
    # 6:     6     6
    

    这篇关于R中唯一值的累积计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆