跟踪字段变化的函数 [英] function to track the changes in a field

查看:23
本文介绍了跟踪字段变化的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个函数(使用基本 SAS 或 RStudio),使我能够确定某个日期的 ID 号和起始日期的原始(根)ID 号.数据集包括旧 ID、新 ID 和 ID 更改日期.示例数据:

I need a function (using base SAS or RStudio) that will enable me to determine the ID numbers as of a certain date and the original (root) ID numbers as of the start date. The dataset includes the old ID, the new ID, and the date the ID changed. Example data:

<头>
OldIDNewID更改日期
121/1/10
10111/1/10
237/1/10
347/10/10
11128/1/10

我需要知道截至 2010 年 7 月 15 日的 ID 号和原始(根)ID(截至 2010 年 1 月 1 日).输出应如下所示:

I need to know the ID numbers as of 7/15/10 and the original (root) ID (as of 1/1/10). The output should look like this:

<头>
OrigIDLastID
14
1011

然后我需要一个标志来帮助我计算在给定时间间隔(在本例中为 2010 年 1 月 1 日到 2010 年 7 月 15 日)更改的 OrigID 的数量.我还需要对 2010 年 7 月 15 日之后的多个日期进行类似的计数.

I will then need a flag that will help me count the number of OrigID's that changed over the given time interval (in this case, 1/1/10 to 7/15/10). I need to do similar counts for multiple dates after 7/15/10 as well.

基础 SAS 或 RStudio 中是否有可以执行此操作的函数?

Is there a function in base SAS or RStudio that can do this?

我研究的 SAS/R 中的函数(分层记录器、同步跟踪、序列跟踪函数)似乎不会起作用(例如,logger、lumberjack、log4r、validate、futile.logger)

It doesn't appear that the functions in SAS/R I researched (hierarchic loggers, synchronous tracking, sequence tracking functions) will work (e.g., logger, lumberjack, log4r, validate, futile.logger)

推荐答案

这应该可以,我只是懒得输入正确的日期.

This should work, I was just too lazy to type proper dates.

注意:这假设数据按更改发生排序.

Note : this assumes the data is sorted by change occurence.

数据

df <- data.frame(
  OldID = c(1, 10, 2, 3, 11), NewID = c(2, 11, 3, 4, 12), ChangeDate = c(1, 1, 2, 2, 3))
df
#>   OldID NewID ChangeDate
#> 1     1     2          1
#> 2    10    11          1
#> 3     2     3          2
#> 4     3     4          2
#> 5    11    12          3

功能

process <- function(df, from, to) {
  process0 <- function(df, i = 1){
    # fetch new value
    new <- df$NewID[i]
    # check in old column
    j <- match(new, df$OldID)
    
    if(is.na(j)) {
      # if not matched, set i to next row
      i <- i + 1
    } else {
      # else we update current row with new "new" value
      df$NewID[i] <- df$NewID[j]
      # and increment the changes
      df$Changes[i] <- df$Changes[i] + 1
      # and remove obsolete row
      df <- df[-j,]
    }
    # do it all over again except if there is no next row
    if(i <= nrow(df)) process0(df, i) else df
  }
  # filter data frame
  df <- subset(df, ChangeDate >= from & ChangeDate <= to, select = c("OldID", "NewID"))
  # start with 1 change per line
  df$Changes <- 1
  # run recursive function
  process0(df)
}

结果

process(df, 1, 2)
#>   OldID NewID Changes
#> 1     1     4       3
#> 2    10    11       1

reprex 包 (v0.3.0) 于 2021-06-09 创建支持>

Created on 2021-06-09 by the reprex package (v0.3.0)

这篇关于跟踪字段变化的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆