在R中的日期上使用Countif [英] Using Countif on Dates in R

查看:114
本文介绍了在R中的日期上使用Countif的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

**A**  | **B**  | **C** |**D** |
:----: | :----: | :----:|:----:|
1/1/17 | 3/1/17 |4/1/17 | H    |
1/1/17 | 3/1/17 |4/1/17 | H    |
2/1/17 | 4/1/17 |5/1/17 | V    |
3/1/17 | 5/1/17 |6/1/17 | V    |
4/1/17 | 5/1/17 |7/1/17 | H    |
4/1/17 | 6/1/17 |7/1/17 | H    |






使用下表查找结果,使用R代码


Looking for the result as in the table below using R code

 1. A column with Unique list of dates from columns A,B & C above
 2. A count of dates <= (less than or equal to) the unique 
    dates column value in each of the columns A,B & C from above table. 
 3. Filtered by column D value of 'H' only 



结果



Result

**Unique Dates**  | **Count of A**  | **Count of B** |**Count of C** |
    :----:        |     :----:      |     :----:     |     :----:    |
    1/1/17        |       2         |       0        |       0       |
    2/1/17        |       2         |       0        |       0       |
    3/1/17        |       2         |       2        |       0       |
    4/1/17        |       4         |       2        |       2       |
    5/1/17        |       4         |       3        |       2       |
    6/1/17        |       4         |       4        |       2       |
    7/1/17        |       4         |       0        |       4       |


推荐答案

乍一看,这个问题似乎很简单重塑任务。仔细观察表明,如果我们想完全遵循OP的规范,就不容易实现这些要求:

At first glance, the question seems to be a simple reshaping task. A closer look shows that the requirements aren't easily implemented if we want to follow the OP's specifications exactly to the spot:



  1. 具有来自A,B和A列的日期的唯一列表的列。 C之上

  2. 日期计数< =(小于或等于)A,B和A列中的每个
    date唯一列值

  3. 仅按列D的值 H过滤


下面的 data.table 解决方案将数据从宽格式重整为整形,是否进行所有聚合,包括通过分组补充长格式的缺失组合,最后整形为宽格式。在代码内的注释中给出了附加说明。

The data.table solution below reshapes the data from wide to long form, does all aggregations including supplementing missing combinations in the long form by grouping and reshapes to wide format finally. Additional explanations are given in the comments within the code.

library(data.table)   # CRAN version 1.10.4 used
# coerce to data.table
setDT(DT)[
  # reshape from wide to long format, 
  # thereby renaming one column as requested
  , melt(.SD, id.vars = "D", value.name = "Unique_Dates")][
    # convert dates from character to class Date
    , Unique_Dates := lubridate::dmy(Unique_Dates)][
      # count occurences by variable & date, 
      # set key & order by variable & date for subsequent cumsum & join
      , .N, keyby = .(D, variable, Unique_Dates)][
        # compute cumsum for each variable along unique dates
        , N := cumsum(N), by = .(D, variable)][
          # join with all possible combinations of D, variables and dates
          # use rolling join to fill missing values
          CJ(D, variable, Unique_Dates, unique = TRUE), roll = Inf][
            # replace remaining NAs
            is.na(N), N := 0L][
              # finally, reshape selected rows from long to wide
              D == "H", dcast(.SD, Unique_Dates ~ paste0("Count_of_", variable))]



   Unique_Dates Count_of_A Count_of_B Count_of_C
1:   2017-01-01          2          0          0
2:   2017-01-02          2          0          0
3:   2017-01-03          2          2          0
4:   2017-01-04          4          2          2
5:   2017-01-05          4          3          2
6:   2017-01-06          4          4          2
7:   2017-01-07          4          4          4




  • 这些列是根据OP的预期结果命名的。

  • 结果包括 2017年1月2日,尽管该日期仅出现在 D == V 的行中,但应该排除在该日期之外最终结果。

  • 滚动连接用于填充缺少的值,而不是 zoo :: na.locf()

    • The columns are named according to OP's expected result.
    • The result includes 2017-01-02 as expected although this date appears only in a row with D == "V" which was supposed to be excluded from the final result.
    • A rolling join is used to fill missing values instead of zoo::na.locf().
    • 在他的问题中,OP提供了打印格式的示例数据

      In his question, the OP has provided sample data in a printed format which was difficult to "scrape":

      library(data.table)
      DT <- fread(
        "**A**  | **B**  | **C** |**D** |
        1/1/17 | 3/1/17 |4/1/17 | H    |
        1/1/17 | 3/1/17 |4/1/17 | H    |
        2/1/17 | 4/1/17 |5/1/17 | V    |
        3/1/17 | 5/1/17 |6/1/17 | V    |
        4/1/17 | 5/1/17 |7/1/17 | H    |
        4/1/17 | 6/1/17 |7/1/17 | H    |",
        sep ="|", drop = 5L, stringsAsFactors = TRUE)[
          , setnames(.SD, stringr::str_replace_all(names(DT), "\\*", ""))][]
      DT
      



              A      B      C D
      1: 1/1/17 3/1/17 4/1/17 H
      2: 1/1/17 3/1/17 4/1/17 H
      3: 2/1/17 4/1/17 5/1/17 V
      4: 3/1/17 5/1/17 6/1/17 V
      5: 4/1/17 5/1/17 7/1/17 H
      6: 4/1/17 6/1/17 7/1/17 H
      


      这篇关于在R中的日期上使用Countif的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆